TY - GEN
T1 - Proposing a new term weighting scheme for text categorization
AU - Lan, Man
AU - Tan, Chew Lim
AU - Low, Hwee Boon
PY - 2006
Y1 - 2006
N2 - In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.χ 2, tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.
AB - In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.χ 2, tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.
UR - https://www.scopus.com/pages/publications/33750835057
M3 - 会议稿件
AN - SCOPUS:33750835057
SN - 1577352815
SN - 9781577352815
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 763
EP - 768
BT - Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
T2 - 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
Y2 - 16 July 2006 through 20 July 2006
ER -