跳到主要导航 跳到搜索 跳到主要内容

Improved relative term frequency probability feature selection for document categorization

  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Feature selection is an important process to choose a subset of features relevant to a particular application in document classification. Those terms which occur unevenly in various categories have strong distinguishable information as to categorization. Firstly, based on the categorical document frequency probability (CTFP), a CTFP_VM feature selection algorithm was designed for feature selection. Secondly, a maximum term frequency conditional distribution factor was proposed to improve the CTFP_VM criterion further. We perform the document categorization experiments on SVM classifiers with the well-known Reuters-21578 and 20news-18828 corpuses as unbalanced and balanced corpus respectively. Experiments compare the novel methods with other conventional feature selection algorithms and the proposed method achieves the excellent feature set for document categorization.

源语言英语
主期刊名Achievements in Engineering Sciences
出版商Trans Tech Publications
1102-1109
页数8
ISBN(印刷版)9783038350842
DOI
出版状态已出版 - 2014
活动3rd International Conference on Manufacturing Engineering and Process, ICMEP 2014 - Seoul, 韩国
期限: 10 4月 201411 4月 2014

出版系列

姓名Applied Mechanics and Materials
548-549
ISSN(印刷版)1660-9336
ISSN(电子版)1662-7482

会议

会议3rd International Conference on Manufacturing Engineering and Process, ICMEP 2014
国家/地区韩国
Seoul
时期10/04/1411/04/14

指纹

探究 'Improved relative term frequency probability feature selection for document categorization' 的科研主题。它们共同构成独一无二的指纹。

引用此