跳到主要导航 跳到搜索 跳到主要内容

A comparative study on term weighting schemes for text categorization

  • Man Lan*
  • , Sam Yuan Sung
  • , Hwee Boon Low
  • , Chew Lim Tan
  • *此作品的通讯作者
  • National University of Singapore
  • Agency for Science, Technology and Research, Singapore

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The term weighting scheme, which is used to convert documents into vectors in the term spaces, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance rather than the kernel functions of S Ms for the text categorization task. In this paper, we conducted experiments to compare various term weighting schemes with S M on two widely-used benchmark data sets. We also presented a new term weighting scheme t f . r f for text categorization. The cross-scheme comparison was performed by using McNcmar's Tests. The controlled experimental results showed that the newly proposed t f . r f scheme is significantly better than other term weighting schemes. Compared with schemes related with t f factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization. The binary and t f .chi representations significantly underperform the other term weighting schemes.

源语言英语
主期刊名Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005
546-551
页数6
DOI
出版状态已出版 - 2005
已对外发布
活动International Joint Conference on Neural Networks, IJCNN 2005 - Montreal, QC, 加拿大
期限: 31 7月 20054 8月 2005

出版系列

姓名Proceedings of the International Joint Conference on Neural Networks
1

会议

会议International Joint Conference on Neural Networks, IJCNN 2005
国家/地区加拿大
Montreal, QC
时期31/07/054/08/05

指纹

探究 'A comparative study on term weighting schemes for text categorization' 的科研主题。它们共同构成独一无二的指纹。

引用此