Building a Web Thesaurus from Web Link Structure

  • Zheng Chen*
  • , Shengping Liu
  • , Liu Wenyin
  • , Geguang Pu
  • , Wei Ying Ma
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

48 Scopus citations

Abstract

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.

Original languageEnglish
Pages (from-to)48-55
Number of pages8
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
DOIs
StatePublished - 2003
Externally publishedYes
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: 28 Jul 20031 Aug 2003

Keywords

  • Content Structure
  • Link Analysis
  • Query Expansion
  • Thesaurus

Fingerprint

Dive into the research topics of 'Building a Web Thesaurus from Web Link Structure'. Together they form a unique fingerprint.

Cite this