跳到主要导航 跳到搜索 跳到主要内容

The integration of multiple feature representations for protein protein interaction classification task

  • Man Lan*
  • , Chew Lim Tan
  • *此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

Background: In order to extract and retrieve protein protein interaction (PPI) information from text, automatic detecting protein interaction relevant articles for database curation is a crucial step. The vast majority of this research used the "bag-of-words" representation, where each feature corresponds to a single word. For the sake of capturing more information left out from this simple bag-of-word representation, we examined alternative ways to represent text based on advanced natural language techniques, i.e. protein named entities, and biological domain knowledge, i.e. trigger keywords. Results: These feature representations are evaluated using SVM classifier on the BioCreAtIvE II benchmark corpus. On their own the new representations are not found to produce a significant performance improvement based on the statistical significance tests. On the other hand, the performance achieved by the integration of 70 trigger keywords and 4 protein named entities features is comparable with that achieved by using bag-of-words alone. In addition, the only 4 protein named entities features (4PNE) obtained the best recall performance (98.13%). Conclusions: In general, our work supports that more sophisticated natural language processing (NLP) techniques and more advanced usage of these techniques need to be developed before better text representations can be produced. The feature representations with simple NLP techniques would benefit the real-life detecting system implemented with great efficiency and speed without losing the classification performance and exhaustive curation system.

源语言英语
页(从-至)3.1-3.17
期刊CEUR Workshop Proceedings
319
出版状态已出版 - 2007
活动2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, 新加坡
期限: 6 12月 20077 12月 2007

指纹

探究 'The integration of multiple feature representations for protein protein interaction classification task' 的科研主题。它们共同构成独一无二的指纹。

引用此