Columnar Formatted Inverted Index for Highly-Paralleled, Vectorized Query Processing

  • Weichen Zhao
  • , Minghao Zhao*
  • , Huiqi Hu
  • , Weining Qian
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Inverted index is a basic tool in many data-intensive applications. Though numerous efforts have been made on efficient inverted index-based query processing, existing schemes do not achieve the expected performance for modern data centers, in which servers are equipped with powerful CPUs and relatively large memory. Through comprehensive measurement studies, we identify the root course is that the data formats for index representation make it unfeasible to design efficient query execution approaches on top of it, which results in poor parallel query support and waste CPU computation. Driven by the findings, we propose to reconcile the in-memory index as columnar structures. To enable this idea, we construct the compact columnar format (i.e., Cocoa) that achieves both desirable space efficiency and maintains the capability for efficient searching support. With Cocoa, we design an efficient query executing scheme that utilizes vectorized batch processing to avoid frequent branch prediction, as well as clause enumeration with pruning to save the overhead of intermediate batch materialization. We build an open-source system VeloSearch to embody our design; experimental results show that VeloSearch achieves 30× better performance compared with state-of-the-art search libraries such as Lucene and Tantivy.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PublisherIEEE Computer Society
Pages1800-1813
Number of pages14
ISBN (Electronic)9798331536039
DOIs
StatePublished - 2025
Event41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China
Duration: 19 May 202523 May 2025

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering, ICDE 2025
Country/TerritoryChina
CityHong Kong
Period19/05/2523/05/25

Keywords

  • columnar format
  • indexing
  • query processing

Fingerprint

Dive into the research topics of 'Columnar Formatted Inverted Index for Highly-Paralleled, Vectorized Query Processing'. Together they form a unique fingerprint.

Cite this