Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space

  • Yuchen Huang
  • , Xiaopeng Fan
  • , Song Yan
  • , Chuliang Weng*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

With the development of AI generated content and LLM (Large Language Model), demands of vector management have brought prosperity to vector databases. However, the status that vectors cannot be retrieved before being indexed, harms timeliness of vector databases. Updating indexes immediately when adding new vectors, reduces throughput of storage. Due to this contradiction, when facing streaming data, using vector database solely in vector services cannot have it both ways: real-time searches and high-throughput storage. This paper proposes a vector buffer engine, Neos. It is designed for real-time unindexed-vector searches on streaming input and buffering vectors with high throughput before loading them into vector databases. On one hand, we build a lightweight storage on raw NVMe device and liberate throughput from indexes, to maximize storage performance. On the other hand, we realize direct NVMe-GPUs 110 stack and a CPU-GPU heterogeneous task architecture for low-latency unindexed-vector searches on streaming data. Experiments show that our approach performs with 1.5x to 3.4x bandwidth, as low as 20% latency compared to existing 110 stacks, and up to orders-of-magnitude higher vector storage throughput under concurrent RIW workloads. Further, N eos can handle real-time unindexed -vector searches with millisecond-level latency on streaming input, a capability that current vector systems lack.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PublisherIEEE Computer Society
Pages3767-3781
Number of pages15
ISBN (Electronic)9798350317152
DOIs
StatePublished - 2024
Event40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, Netherlands
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference40th IEEE International Conference on Data Engineering, ICDE 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2417/05/24

Keywords

  • Multi-GPU
  • NVMe
  • Vector search and storage

Fingerprint

Dive into the research topics of 'Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space'. Together they form a unique fingerprint.

Cite this