Proposing a new term weighting scheme for text categorization

  • Man Lan*
  • , Chew Lim Tan
  • , Hwee Boon Low
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

71 Scopus citations

Abstract

In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.χ 2, tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.

Original languageEnglish
Title of host publicationProceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
Pages763-768
Number of pages6
StatePublished - 2006
Externally publishedYes
Event21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06 - Boston, MA, United States
Duration: 16 Jul 200620 Jul 2006

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume1

Conference

Conference21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
Country/TerritoryUnited States
CityBoston, MA
Period16/07/0620/07/06

Fingerprint

Dive into the research topics of 'Proposing a new term weighting scheme for text categorization'. Together they form a unique fingerprint.

Cite this