TY - JOUR
T1 - Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification
AU - Zhou, Fang
AU - Gao, Suting
AU - Ni, Lyu
AU - Pavlovski, Martin
AU - Dong, Qiwen
AU - Obradovic, Zoran
AU - Qian, Weining
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2022/9
Y1 - 2022/9
N2 - Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an effective meta-framework for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble), which (1) leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and (2) assigns proper instance weights to address the issues of noisy data. Furthermore, most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS. The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvements in F1-score when compared to a broad range of published models.
AB - Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an effective meta-framework for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble), which (1) leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and (2) assigns proper instance weights to address the issues of noisy data. Furthermore, most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS. The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvements in F1-score when compared to a broad range of published models.
KW - Class-overlapped data
KW - Dynamic self-paced sampling
KW - Highly class imbalance
UR - https://www.scopus.com/pages/publications/85132141549
U2 - 10.1007/s10618-022-00838-z
DO - 10.1007/s10618-022-00838-z
M3 - 文章
AN - SCOPUS:85132141549
SN - 1384-5810
VL - 36
SP - 1601
EP - 1622
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 5
ER -