TY - JOUR
T1 - Boosting prediction performance on imbalanced dataset
AU - Zareapoor, Masoumeh
AU - Shamsolmoali, Pourya
N1 - Publisher Copyright:
Copyright © 2018 Inderscience Enterprises Ltd.
PY - 2018
Y1 - 2018
N2 - Mining from imbalance data is an important problem in algorithmic and performance evaluation. When a dataset is imbalanced, the classification technique is not equal considering both the classes. It is obvious that the standard classifiers are not suitable to deal with imbalanced data, since they will likely classify all the instances into the majority class, which is the less important class. Additionally some of the performance measurement, like accuracy - which is known to be a biased metric in the case of imbalance data - does not have a very good performance when the data is imbalanced. In this paper, we tried to apply various techniques used commonly to handle class imbalance, before giving the data to the classifiers. But, the performance of the classifiers is found degrading because of the highly imbalanced nature of the datasets. Hence, we propose an integrated sampling technique with an ensemble of AdaBoost to improve the prediction performance. Meanwhile, through empirical, we show the more appropriate performance measures for mining imbalanced datasets.
AB - Mining from imbalance data is an important problem in algorithmic and performance evaluation. When a dataset is imbalanced, the classification technique is not equal considering both the classes. It is obvious that the standard classifiers are not suitable to deal with imbalanced data, since they will likely classify all the instances into the majority class, which is the less important class. Additionally some of the performance measurement, like accuracy - which is known to be a biased metric in the case of imbalance data - does not have a very good performance when the data is imbalanced. In this paper, we tried to apply various techniques used commonly to handle class imbalance, before giving the data to the classifiers. But, the performance of the classifiers is found degrading because of the highly imbalanced nature of the datasets. Hence, we propose an integrated sampling technique with an ensemble of AdaBoost to improve the prediction performance. Meanwhile, through empirical, we show the more appropriate performance measures for mining imbalanced datasets.
KW - Classification
KW - Ensemble
KW - Imbalanced dataset
KW - Re-sampling
UR - https://www.scopus.com/pages/publications/85044647642
U2 - 10.1504/IJICT.2018.090556
DO - 10.1504/IJICT.2018.090556
M3 - 文章
AN - SCOPUS:85044647642
SN - 1466-6642
VL - 13
SP - 186
EP - 195
JO - International Journal of Information and Communication Technology
JF - International Journal of Information and Communication Technology
IS - 2
ER -