TY - GEN
T1 - Automatic filtering algorithm for imbalanced classification
AU - Gong, Wei
AU - Zhou, Youjie
AU - Luo, Hangzai
AU - Fan, Jianping
AU - Zhou, Aoying
PY - 2010
Y1 - 2010
N2 - The imbalanced data set has been reported to hinder the classification performance of many machine learning algorithms on both accuracy and speed. But extremely imbalanced data sets (3~5% positive samples) are common for many applications, such as multimedia semantic classification. In this paper, we propose a novel algorithm to automatically remove samples that have no or negative effects on classifier training for imbalanced training data sets. By using our algorithm, most easy-to-classify dominant-class samples in imbalanced training set will be eliminated automatically. As a result, the ratio of minority class samples is increased significantly, making it more suitable for classification algorithms. Experiments show that our algorithm can keep the classification accuracy of SVM, and decrease the training time dramatically.
AB - The imbalanced data set has been reported to hinder the classification performance of many machine learning algorithms on both accuracy and speed. But extremely imbalanced data sets (3~5% positive samples) are common for many applications, such as multimedia semantic classification. In this paper, we propose a novel algorithm to automatically remove samples that have no or negative effects on classifier training for imbalanced training data sets. By using our algorithm, most easy-to-classify dominant-class samples in imbalanced training set will be eliminated automatically. As a result, the ratio of minority class samples is increased significantly, making it more suitable for classification algorithms. Experiments show that our algorithm can keep the classification accuracy of SVM, and decrease the training time dramatically.
UR - https://www.scopus.com/pages/publications/78649250618
U2 - 10.1109/FSKD.2010.5569437
DO - 10.1109/FSKD.2010.5569437
M3 - 会议稿件
AN - SCOPUS:78649250618
SN - 9781424459346
T3 - Proceedings - 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010
SP - 1853
EP - 1857
BT - Proceedings - 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010
T2 - 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010
Y2 - 10 August 2010 through 12 August 2010
ER -