跳到主要导航 跳到搜索 跳到主要内容

Proposing a new term weighting scheme for text categorization

  • Man Lan*
  • , Chew Lim Tan
  • , Hwee Boon Low
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.χ 2, tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.

源语言英语
主期刊名Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
763-768
页数6
出版状态已出版 - 2006
已对外发布
活动21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06 - Boston, MA, 美国
期限: 16 7月 200620 7月 2006

出版系列

姓名Proceedings of the National Conference on Artificial Intelligence
1

会议

会议21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
国家/地区美国
Boston, MA
时期16/07/0620/07/06

指纹

探究 'Proposing a new term weighting scheme for text categorization' 的科研主题。它们共同构成独一无二的指纹。

引用此