TY - GEN
T1 - Text representations for text categorization
T2 - 2007 International Joint Conference on Neural Networks, IJCNN 2007
AU - Lan, Man
AU - Tan, Chew Lim
AU - Su, Jian
AU - Low, Hwee Boon
PY - 2007
Y1 - 2007
N2 - In vector space model (VSM), textual documents are represented as vectors in the term space. Therefore, there are two issues in this representation, i.e. (1) what should a term be and (2) how to weight a term. This paper examined ways to represent text from the above two aspects to improve the performance of text categorization. Different representations have been evaluated using SVM on three biomedical corpora. The controlled experiments showed that the straightforward usage of named entities as terms in VSM does not show performance improvements over the bag-of-words representation. On the other hand, the term weighting method slightly improved the performance. However, to further improve the performance of text categorization, more advanced techniques and more effective usages of natural language processing for text representations appear needed.
AB - In vector space model (VSM), textual documents are represented as vectors in the term space. Therefore, there are two issues in this representation, i.e. (1) what should a term be and (2) how to weight a term. This paper examined ways to represent text from the above two aspects to improve the performance of text categorization. Different representations have been evaluated using SVM on three biomedical corpora. The controlled experiments showed that the straightforward usage of named entities as terms in VSM does not show performance improvements over the bag-of-words representation. On the other hand, the term weighting method slightly improved the performance. However, to further improve the performance of text categorization, more advanced techniques and more effective usages of natural language processing for text representations appear needed.
UR - https://www.scopus.com/pages/publications/51749119679
U2 - 10.1109/IJCNN.2007.4371361
DO - 10.1109/IJCNN.2007.4371361
M3 - 会议稿件
AN - SCOPUS:51749119679
SN - 142441380X
SN - 9781424413805
T3 - IEEE International Conference on Neural Networks - Conference Proceedings
SP - 2557
EP - 2562
BT - The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
Y2 - 12 August 2007 through 17 August 2007
ER -