Text representations for text categorization: A case study in biomedical domain

  • Man Lan*
  • , Chew Lim Tan
  • , Jian Su
  • , Hwee Boon Low
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

In vector space model (VSM), textual documents are represented as vectors in the term space. Therefore, there are two issues in this representation, i.e. (1) what should a term be and (2) how to weight a term. This paper examined ways to represent text from the above two aspects to improve the performance of text categorization. Different representations have been evaluated using SVM on three biomedical corpora. The controlled experiments showed that the straightforward usage of named entities as terms in VSM does not show performance improvements over the bag-of-words representation. On the other hand, the term weighting method slightly improved the performance. However, to further improve the performance of text categorization, more advanced techniques and more effective usages of natural language processing for text representations appear needed.

Original languageEnglish
Title of host publicationThe 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
Pages2557-2562
Number of pages6
DOIs
StatePublished - 2007
Externally publishedYes
Event2007 International Joint Conference on Neural Networks, IJCNN 2007 - Orlando, FL, United States
Duration: 12 Aug 200717 Aug 2007

Publication series

NameIEEE International Conference on Neural Networks - Conference Proceedings
ISSN (Print)1098-7576

Conference

Conference2007 International Joint Conference on Neural Networks, IJCNN 2007
Country/TerritoryUnited States
CityOrlando, FL
Period12/08/0717/08/07

Fingerprint

Dive into the research topics of 'Text representations for text categorization: A case study in biomedical domain'. Together they form a unique fingerprint.

Cite this