Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification

Fang Zhou*, Suting Gao, Lyu Ni, Martin Pavlovski, Qiwen Dong, Zoran Obradovic, Weining Qian

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an effective meta-framework for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble), which (1) leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and (2) assigns proper instance weights to address the issues of noisy data. Furthermore, most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS. The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvements in F1-score when compared to a broad range of published models.

Original languageEnglish
Pages (from-to)1601-1622
Number of pages22
JournalData Mining and Knowledge Discovery
Volume36
Issue number5
DOIs
StatePublished - Sep 2022

Keywords

  • Class-overlapped data
  • Dynamic self-paced sampling
  • Highly class imbalance

Fingerprint

Dive into the research topics of 'Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification'. Together they form a unique fingerprint.

Cite this