TY - GEN
T1 - Threshold selection for classification with skewed class distribution
AU - He, Xiaofeng
AU - Zhang, Rong
AU - Zhou, Aoying
PY - 2013
Y1 - 2013
N2 - In document classification, threshold selection receives little attention, particularly in binary classification cases where threshold selection was largely ignored as a trivial task of a post-processing step. In webpage classification, however, we are facing a problem involving huge number of webpages usually with highly imbalanced class distribution. Due to the budget constraint, a reliable estimate of the threshold is required on only small size of human judged webpages. A good threshold selection criterion also need be adopted in highly imbalanced class distribution situation with positives being very spares in the sample set. These challenges make the threshold selection a non-trivial task for webpage classification. In this paper, we propose a novel cost efficient approach of threshold selection method for binary webpage classification with highly imbalanced class distribution. We construct a small sample set by applying stratified sampling on the webpages. The human judged samples are expanded to reflect the true class distribution of the webpage population. Experimental results show that false positive rate leads to more stable threshold estimate.
AB - In document classification, threshold selection receives little attention, particularly in binary classification cases where threshold selection was largely ignored as a trivial task of a post-processing step. In webpage classification, however, we are facing a problem involving huge number of webpages usually with highly imbalanced class distribution. Due to the budget constraint, a reliable estimate of the threshold is required on only small size of human judged webpages. A good threshold selection criterion also need be adopted in highly imbalanced class distribution situation with positives being very spares in the sample set. These challenges make the threshold selection a non-trivial task for webpage classification. In this paper, we propose a novel cost efficient approach of threshold selection method for binary webpage classification with highly imbalanced class distribution. We construct a small sample set by applying stratified sampling on the webpages. The human judged samples are expanded to reflect the true class distribution of the webpage population. Experimental results show that false positive rate leads to more stable threshold estimate.
UR - https://www.scopus.com/pages/publications/84893122206
U2 - 10.1007/978-3-642-39527-7_37
DO - 10.1007/978-3-642-39527-7_37
M3 - 会议稿件
AN - SCOPUS:84893122206
SN - 9783642395260
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 383
EP - 393
BT - Web-Age Information Management - WAIM 2013, International Workshops
T2 - 14th International Conference on Web-Age Information Management, WAIM 2013
Y2 - 14 June 2013 through 16 June 2013
ER -