TY - GEN
T1 - Approximate top-k structural similarity search over XML documents
AU - Xie, Tao
AU - Sha, Chaofeng
AU - Wang, Xiaoling
AU - Zhou, Aoying
PY - 2006
Y1 - 2006
N2 - With the development of XML applications, such as Digital Library, XML subscribe/publish system, and other XML repositories, top-k structural similarity search over XML documents is attracting more attention. The similarity of two XML documents can be measured by using the edit distance defined between XML trees in previous work. Since the computation of edit distances is time consuming, some recent work presented some approaches to calculate edit distance by using structural summaries to improve the algorithm performance. However, most existing algorithms for calculating edit distance between trees ignore the fact that nodes in a tree may be of different significance, and the same edit operation costs are assumed inappropriately for all nodes in XML document tree. This paper addresses this problem by proposing a summary structure which could be used to make the tree-based edit distance more rational; furthermore, a novel weighting scheme is proposed to indicate that some nodes are more important than others with respect for structural similarity. We introduce a new cost model for computing structural distance and takes weight information into account for nodes in distance computation in this paper. Compared with former techniques, our approach can approximately answer the top-k queries efficiently. We verify this approach through a series of experiments, and the results show that using weighted structural summaries for top-k queries is efficient and practical.
AB - With the development of XML applications, such as Digital Library, XML subscribe/publish system, and other XML repositories, top-k structural similarity search over XML documents is attracting more attention. The similarity of two XML documents can be measured by using the edit distance defined between XML trees in previous work. Since the computation of edit distances is time consuming, some recent work presented some approaches to calculate edit distance by using structural summaries to improve the algorithm performance. However, most existing algorithms for calculating edit distance between trees ignore the fact that nodes in a tree may be of different significance, and the same edit operation costs are assumed inappropriately for all nodes in XML document tree. This paper addresses this problem by proposing a summary structure which could be used to make the tree-based edit distance more rational; furthermore, a novel weighting scheme is proposed to indicate that some nodes are more important than others with respect for structural similarity. We introduce a new cost model for computing structural distance and takes weight information into account for nodes in distance computation in this paper. Compared with former techniques, our approach can approximately answer the top-k queries efficiently. We verify this approach through a series of experiments, and the results show that using weighted structural summaries for top-k queries is efficient and practical.
UR - https://www.scopus.com/pages/publications/33745644286
U2 - 10.1007/11610113_29
DO - 10.1007/11610113_29
M3 - 会议稿件
AN - SCOPUS:33745644286
SN - 3540311424
SN - 9783540311423
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 319
EP - 330
BT - Frontiers of WWW Research and Development - APWeb 2006 - 8th Asia-Pacific Web Conference, Proceedings
T2 - 8th Asia-Pacific Web Conference, APWeb 2006: Frontiers of WWW Research and Development
Y2 - 16 January 2006 through 18 January 2006
ER -