Threshold selection for classification with skewed class distribution

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In document classification, threshold selection receives little attention, particularly in binary classification cases where threshold selection was largely ignored as a trivial task of a post-processing step. In webpage classification, however, we are facing a problem involving huge number of webpages usually with highly imbalanced class distribution. Due to the budget constraint, a reliable estimate of the threshold is required on only small size of human judged webpages. A good threshold selection criterion also need be adopted in highly imbalanced class distribution situation with positives being very spares in the sample set. These challenges make the threshold selection a non-trivial task for webpage classification. In this paper, we propose a novel cost efficient approach of threshold selection method for binary webpage classification with highly imbalanced class distribution. We construct a small sample set by applying stratified sampling on the webpages. The human judged samples are expanded to reflect the true class distribution of the webpage population. Experimental results show that false positive rate leads to more stable threshold estimate.

Original languageEnglish
Title of host publicationWeb-Age Information Management - WAIM 2013, International Workshops
Subtitle of host publicationHardBD, MDSP, BigEM, TMSN, LQPM, BDMS, Proceedings
Pages383-393
Number of pages11
DOIs
StatePublished - 2013
Event14th International Conference on Web-Age Information Management, WAIM 2013 - Beidaihe, China
Duration: 14 Jun 201316 Jun 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7901 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Web-Age Information Management, WAIM 2013
Country/TerritoryChina
CityBeidaihe
Period14/06/1316/06/13

Fingerprint

Dive into the research topics of 'Threshold selection for classification with skewed class distribution'. Together they form a unique fingerprint.

Cite this