GRAMS3: An efficient framework for XML structural similarity search

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Structural similarity search is a fundamental technology for XML data management. However, existing methods do not scale well with large volume of XML document. The pq-gram is an efficient way of extracting substructure from the tree-structured data for approximate structural similarity search. In this paper, we propose an effective framework GRAMS3 for evaluating structural similarity of XML data. First pq-grams of XML document are extracted; then we study the characteristics of pq-gram of XML and generate doc-gram vector using TGF-IGF model for XML tree; finally we employ locality sensitive hashing for efficiently structural similarity search of XML documents. An empirical study using both synthetic and real datasets demonstrates the framework is efficient.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 15th International Conference, DASFAA 2010, International Workshops
Subtitle of host publicationGDM, BenchmarX, MCIS, SNSMW, DIEW, UDM, Revised Selected Papers
Pages422-433
Number of pages12
DOIs
StatePublished - 2010
Event15th International Conference on Database Systems for Advanced Applications, DASFAA 2010 - Tsukuba, Japan
Duration: 1 Apr 20104 Apr 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6193 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
Country/TerritoryJapan
CityTsukuba
Period1/04/104/04/10

Keywords

  • Locality Sensitive Hashing
  • Structural Similarity
  • XML Document
  • pq-Gram

Fingerprint

Dive into the research topics of 'GRAMS3: An efficient framework for XML structural similarity search'. Together they form a unique fingerprint.

Cite this