Automatic topic identification using webpage clustering

Xiaofeng He*, Chris H.Q. Ding, Hongyuan Zha, Horst D. Simon

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

65 Scopus citations

Abstract

Grouping webpages into distinct topics is one way to organize the large amount of retrieved information on the web. In this paper, we report that based on similarity metric which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, as shown in experiments on several retrieved sets of webpages. The clustering method is a state-of-art spectral graph partitioning method based on normalized cut criterion first developed for image segmentation.

Original languageEnglish
Title of host publicationProceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Pages195-202
Number of pages8
StatePublished - 2001
Externally publishedYes
Event1st IEEE International Conference on Data Mining, ICDM'01 - San Jose, CA, United States
Duration: 29 Nov 20012 Dec 2001

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference1st IEEE International Conference on Data Mining, ICDM'01
Country/TerritoryUnited States
CitySan Jose, CA
Period29/11/012/12/01

Fingerprint

Dive into the research topics of 'Automatic topic identification using webpage clustering'. Together they form a unique fingerprint.

Cite this