摘要
Background: In order to extract and retrieve protein protein interaction (PPI) information from text, automatic detecting protein interaction relevant articles for database curation is a crucial step. The vast majority of this research used the "bag-of-words" representation, where each feature corresponds to a single word. For the sake of capturing more information left out from this simple bag-of-word representation, we examined alternative ways to represent text based on advanced natural language techniques, i.e. protein named entities, and biological domain knowledge, i.e. trigger keywords. Results: These feature representations are evaluated using SVM classifier on the BioCreAtIvE II benchmark corpus. On their own the new representations are not found to produce a significant performance improvement based on the statistical significance tests. On the other hand, the performance achieved by the integration of 70 trigger keywords and 4 protein named entities features is comparable with that achieved by using bag-of-words alone. In addition, the only 4 protein named entities features (4PNE) obtained the best recall performance (98.13%). Conclusions: In general, our work supports that more sophisticated natural language processing (NLP) techniques and more advanced usage of these techniques need to be developed before better text representations can be produced. The feature representations with simple NLP techniques would benefit the real-life detecting system implemented with great efficiency and speed without losing the classification performance and exhaustive curation system.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 3.1-3.17 |
| 期刊 | CEUR Workshop Proceedings |
| 卷 | 319 |
| 出版状态 | 已出版 - 2007 |
| 活动 | 2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, 新加坡 期限: 6 12月 2007 → 7 12月 2007 |
指纹
探究 'The integration of multiple feature representations for protein protein interaction classification task' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver