Abstract
In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.
| Original language | English |
|---|---|
| Pages (from-to) | 235-239 |
| Number of pages | 5 |
| Journal | Journal of Shanghai University |
| Volume | 12 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jun 2008 |
Keywords
- Feature item
- Improved vector space model (IVSM)
- Keyword weight
- Representation feature
- Semantic similarity