A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

Man Lan, Chew Lim Tan, Hwee Boon Low, Sam Yuan Sung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

95 Scopus citations

Abstract

Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital step in automatic text categorization. In this paper, we conducted comprehensive experiments to compare various term weighting schemes with SVM on two widely-used benchmark data sets. We also presented a new term weighting scheme tf-rf to improve the term's discriminating power. The controlled experimental results showed that this newly proposed tf-rf scheme is significantly better than other widely-used term weighting schemes. Compared with schemes related with tf factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization.

Original languageEnglish
Title of host publication14th International World Wide Web Conference, WWW2005
Pages1032-1033
Number of pages2
DOIs
StatePublished - 2005
Externally publishedYes
Event14th International World Wide Web Conference, WWW2005 - Chiba, Japan
Duration: 10 May 200514 May 2005

Publication series

Name14th International World Wide Web Conference, WWW2005

Conference

Conference14th International World Wide Web Conference, WWW2005
Country/TerritoryJapan
CityChiba
Period10/05/0514/05/05

Keywords

  • Categorization
  • SVM
  • Term weighting schemes
  • Text

Fingerprint

Dive into the research topics of 'A comprehensive comparative study on term weighting schemes for text categorization with support vector machines'. Together they form a unique fingerprint.

Cite this