跳到主要导航 跳到搜索 跳到主要内容

Dimension reduction based on categorical fuzzy correlation degree for document categorization

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

High dimensionality of the feature space is a common problem in document categorization. Most of the features obtained through conventional feature selection algorithms such as IG are relevant and redundant. In this paper, a two-step feature selection method is proposed. At the first step redundancy analysis among original features based on categorical fuzzy correlation degree is applied to filter the redundant features with the similar categorical term frequency distribution. In the second step, conventional IG feature selection algorithm is adopted to select the final feature set for document categorization. Experiments dealing with the well-known Reuters-21578 and 20news-18828 corpuses show that the proposed method can eliminate redundant features with high fuzzy correlation degree between each other and obtain a compressed feature space where the dimension of feature space is dramatically reduced. The document categorization results on two corpuses show that the conventional IG feature selection algorithm can achieve a better document categorization performance on the compressed feature space and demonstrate the effectiveness of the proposed method.

源语言英语
主期刊名Proceedings - 2013 IEEE International Conference on Granular Computing, GrC 2013
出版商IEEE Computer Society
186-190
页数5
ISBN(印刷版)9781479912810
DOI
出版状态已出版 - 2013
活动2013 IEEE International Conference on Granular Computing, GrC 2013 - Beijing, 中国
期限: 13 12月 201315 12月 2013

出版系列

姓名Proceedings - 2013 IEEE International Conference on Granular Computing, GrC 2013

会议

会议2013 IEEE International Conference on Granular Computing, GrC 2013
国家/地区中国
Beijing
时期13/12/1315/12/13

指纹

探究 'Dimension reduction based on categorical fuzzy correlation degree for document categorization' 的科研主题。它们共同构成独一无二的指纹。

引用此