XML structural similarity search using MapReduce

Peisen Yuan, Chaofeng Sha, Xiaoling Wang, Bin Yang, Aoying Zhou, Su Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 11th International Conference, WAIM 2010, Proceedings
Pages169-181
Number of pages13
DOIs
StatePublished - 2010
Event11th International Conference on Web-Age Information Management, WAIM 2010 - Jiuzhaigou, China
Duration: 15 Jul 201017 Jul 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6184 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on Web-Age Information Management, WAIM 2010
Country/TerritoryChina
CityJiuzhaigou
Period15/07/1017/07/10

Fingerprint

Dive into the research topics of 'XML structural similarity search using MapReduce'. Together they form a unique fingerprint.

Cite this