TY - GEN
T1 - Neos
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
AU - Huang, Yuchen
AU - Fan, Xiaopeng
AU - Yan, Song
AU - Weng, Chuliang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - With the development of AI generated content and LLM (Large Language Model), demands of vector management have brought prosperity to vector databases. However, the status that vectors cannot be retrieved before being indexed, harms timeliness of vector databases. Updating indexes immediately when adding new vectors, reduces throughput of storage. Due to this contradiction, when facing streaming data, using vector database solely in vector services cannot have it both ways: real-time searches and high-throughput storage. This paper proposes a vector buffer engine, Neos. It is designed for real-time unindexed-vector searches on streaming input and buffering vectors with high throughput before loading them into vector databases. On one hand, we build a lightweight storage on raw NVMe device and liberate throughput from indexes, to maximize storage performance. On the other hand, we realize direct NVMe-GPUs 110 stack and a CPU-GPU heterogeneous task architecture for low-latency unindexed-vector searches on streaming data. Experiments show that our approach performs with 1.5x to 3.4x bandwidth, as low as 20% latency compared to existing 110 stacks, and up to orders-of-magnitude higher vector storage throughput under concurrent RIW workloads. Further, N eos can handle real-time unindexed -vector searches with millisecond-level latency on streaming input, a capability that current vector systems lack.
AB - With the development of AI generated content and LLM (Large Language Model), demands of vector management have brought prosperity to vector databases. However, the status that vectors cannot be retrieved before being indexed, harms timeliness of vector databases. Updating indexes immediately when adding new vectors, reduces throughput of storage. Due to this contradiction, when facing streaming data, using vector database solely in vector services cannot have it both ways: real-time searches and high-throughput storage. This paper proposes a vector buffer engine, Neos. It is designed for real-time unindexed-vector searches on streaming input and buffering vectors with high throughput before loading them into vector databases. On one hand, we build a lightweight storage on raw NVMe device and liberate throughput from indexes, to maximize storage performance. On the other hand, we realize direct NVMe-GPUs 110 stack and a CPU-GPU heterogeneous task architecture for low-latency unindexed-vector searches on streaming data. Experiments show that our approach performs with 1.5x to 3.4x bandwidth, as low as 20% latency compared to existing 110 stacks, and up to orders-of-magnitude higher vector storage throughput under concurrent RIW workloads. Further, N eos can handle real-time unindexed -vector searches with millisecond-level latency on streaming input, a capability that current vector systems lack.
KW - Multi-GPU
KW - NVMe
KW - Vector search and storage
UR - https://www.scopus.com/pages/publications/85200491544
U2 - 10.1109/ICDE60146.2024.00289
DO - 10.1109/ICDE60146.2024.00289
M3 - 会议稿件
AN - SCOPUS:85200491544
T3 - Proceedings - International Conference on Data Engineering
SP - 3767
EP - 3781
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
Y2 - 13 May 2024 through 17 May 2024
ER -