An improved algorithm for weighting keywords in web documents

Shuang Sun*, Liang He, Jing Yang, Jun Zhong Gu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.

Original languageEnglish
Pages (from-to)235-239
Number of pages5
JournalJournal of Shanghai University
Volume12
Issue number3
DOIs
StatePublished - Jun 2008

Keywords

  • Feature item
  • Improved vector space model (IVSM)
  • Keyword weight
  • Representation feature
  • Semantic similarity

Fingerprint

Dive into the research topics of 'An improved algorithm for weighting keywords in web documents'. Together they form a unique fingerprint.

Cite this