Dimension reduction based on categorical fuzzy correlation degree for document categorization

Qiang Li, Liang He, Xin Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

High dimensionality of the feature space is a common problem in document categorization. Most of the features obtained through conventional feature selection algorithms such as IG are relevant and redundant. In this paper, a two-step feature selection method is proposed. At the first step redundancy analysis among original features based on categorical fuzzy correlation degree is applied to filter the redundant features with the similar categorical term frequency distribution. In the second step, conventional IG feature selection algorithm is adopted to select the final feature set for document categorization. Experiments dealing with the well-known Reuters-21578 and 20news-18828 corpuses show that the proposed method can eliminate redundant features with high fuzzy correlation degree between each other and obtain a compressed feature space where the dimension of feature space is dramatically reduced. The document categorization results on two corpuses show that the conventional IG feature selection algorithm can achieve a better document categorization performance on the compressed feature space and demonstrate the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE International Conference on Granular Computing, GrC 2013
PublisherIEEE Computer Society
Pages186-190
Number of pages5
ISBN (Print)9781479912810
DOIs
StatePublished - 2013
Event2013 IEEE International Conference on Granular Computing, GrC 2013 - Beijing, China
Duration: 13 Dec 201315 Dec 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Granular Computing, GrC 2013

Conference

Conference2013 IEEE International Conference on Granular Computing, GrC 2013
Country/TerritoryChina
CityBeijing
Period13/12/1315/12/13

Keywords

  • document categorization
  • feature selection
  • fuzzy correlation degree
  • redundancy
  • relevance

Fingerprint

Dive into the research topics of 'Dimension reduction based on categorical fuzzy correlation degree for document categorization'. Together they form a unique fingerprint.

Cite this