跳到主要导航 跳到搜索 跳到主要内容

Columnar Formatted Inverted Index for Highly-Paralleled, Vectorized Query Processing

  • Weichen Zhao
  • , Minghao Zhao*
  • , Huiqi Hu
  • , Weining Qian
  • *此作品的通讯作者
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Inverted index is a basic tool in many data-intensive applications. Though numerous efforts have been made on efficient inverted index-based query processing, existing schemes do not achieve the expected performance for modern data centers, in which servers are equipped with powerful CPUs and relatively large memory. Through comprehensive measurement studies, we identify the root course is that the data formats for index representation make it unfeasible to design efficient query execution approaches on top of it, which results in poor parallel query support and waste CPU computation. Driven by the findings, we propose to reconcile the in-memory index as columnar structures. To enable this idea, we construct the compact columnar format (i.e., Cocoa) that achieves both desirable space efficiency and maintains the capability for efficient searching support. With Cocoa, we design an efficient query executing scheme that utilizes vectorized batch processing to avoid frequent branch prediction, as well as clause enumeration with pruning to save the overhead of intermediate batch materialization. We build an open-source system VeloSearch to embody our design; experimental results show that VeloSearch achieves 30× better performance compared with state-of-the-art search libraries such as Lucene and Tantivy.

源语言英语
主期刊名Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
出版商IEEE Computer Society
1800-1813
页数14
ISBN(电子版)9798331536039
DOI
出版状态已出版 - 2025
活动41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, 中国
期限: 19 5月 202523 5月 2025

出版系列

姓名Proceedings - International Conference on Data Engineering
ISSN(印刷版)1084-4627
ISSN(电子版)2375-0286

会议

会议41st IEEE International Conference on Data Engineering, ICDE 2025
国家/地区中国
Hong Kong
时期19/05/2523/05/25

指纹

探究 'Columnar Formatted Inverted Index for Highly-Paralleled, Vectorized Query Processing' 的科研主题。它们共同构成独一无二的指纹。

引用此