LCJoin: Set containment join via list crosscutting

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

A set containment join operates on two set-valued attributes with a subset () relationship as the join condition. It has many real-world applications, such as in publish/subscribe services and inclusion dependency discovery. Existing solutions can be broadly classified into union-oriented and intersection-oriented methods. Based on several recent studies, union-oriented methods are not competitive as they involve an expensive subset enumeration step. Intersection-oriented methods build an inverted index on one attribute and perform inverted list intersection on another attribute. Existing intersection-oriented methods intersect inverted lists one-by-one. In contrast, in this paper, we propose to intersect all the inverted lists simultaneously while skipping many irrelevant entries in the lists. To share computation, we utilize the prefix tree structure and extend our novel list intersection method to operate on the prefix tree. To further improve the efficiency, we propose to partition the data and use different methods to process each partition. We evaluated our methods using both real-world and synthetic datasets. Experimental results show that our approach outperforms existing methods by up to 10×.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PublisherIEEE Computer Society
Pages362-373
Number of pages12
ISBN (Electronic)9781538674741
DOIs
StatePublished - Apr 2019
Externally publishedYes
Event35th IEEE International Conference on Data Engineering, ICDE 2019 - Macau, China
Duration: 8 Apr 201911 Apr 2019

Publication series

NameProceedings - International Conference on Data Engineering
Volume2019-April
ISSN (Print)1084-4627

Conference

Conference35th IEEE International Conference on Data Engineering, ICDE 2019
Country/TerritoryChina
CityMacau
Period8/04/1911/04/19

Keywords

  • Approximate query processing
  • Data integration
  • Set containment
  • Similarity join

Fingerprint

Dive into the research topics of 'LCJoin: Set containment join via list crosscutting'. Together they form a unique fingerprint.

Cite this