Similarity measure and structural index of XML documents

Shi Hui Zheng*, Ao Ying Zhou, Long Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

This paper presents a quantitative approach to measure the difference between two XML documents, called XED distance. An XML document can be represented as a concise, weighted, structural index tree. It is proven that the similarity between two XML documents can be measured by distance between their structural index trees. Since the structural index tree is dramatically smaller than the original tree, it can greatly reduce the cost for measuring the similarity between two XML documents. The approach presented in this paper can be used in many applications, such as approximate searching of XML documents, clustering XML documents, structural extracting of XML documents, change checking of XML documents, and incremental maintenance of XML views, etc.

Original languageEnglish
Pages (from-to)1116-1122
Number of pages7
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume26
Issue number9
StatePublished - Sep 2003
Externally publishedYes

Keywords

  • Edit distance
  • Structural index tree
  • XED distance

Fingerprint

Dive into the research topics of 'Similarity measure and structural index of XML documents'. Together they form a unique fingerprint.

Cite this