SylphDB: An Active and Adaptive LSM Engine for Update-Intensive Workloads

Jun Peng Zhu, Zhiwei Ye, Xiaolong He, Peng Cai, Xuan Zhou, Aoying Zhou, Dunbo Cai, Ling Qian, Kai Xu, Liu Tang, Qi Liu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Update-intensive workloads are prevalent in contemporary OLTP and AI/ML scenarios. An update operation typically involves deleting the old version of the target record and then inserting a new version. In this work, we demonstrate that an LSM-tree faces two issues when dealing with update-intensive workloads. Firstly, the deleted old versions are not promptly garbage collected until they merge with their new versions during compaction. This may lead to space waste and write amplification. Secondly, it is common for an update operation to modify only a small fraction of a data record, such as one of a hundred attributes. However, state-of-the-art LSM-trees fail to effectively utilize the incremental storage strategy, which involves storing only the updated fraction rather than the entire new version to enhance efficiency. In this paper, we propose two techniques, active and fast garbage collection, and adaptive incremental updating, to address these issues, respectively. Active and fast garbage collection probes the distribution of invalid data versions in an LSM-tree and performs garbage collection in a more promptly manner. Adaptive incremental updating applies different storage modes to the update operation to achieve balanced write and read amplification ratios as much as possible. Based on the techniques, we introduce SylphDB implemented based on the codebase of RocksDB and optimized for update-intensive workloads. Experimental results demonstrated that, compared to traditional LSM-tree based systems, SylphDB can improve the efficiency of garbage collection by 2× and reduce write amplification by 20%.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PublisherIEEE Computer Society
Pages4360-4372
Number of pages13
ISBN (Electronic)9798331536039
DOIs
StatePublished - 2025
Event41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China
Duration: 19 May 202523 May 2025

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering, ICDE 2025
Country/TerritoryChina
CityHong Kong
Period19/05/2523/05/25

Keywords

  • garbage collection
  • LSM-tree

Fingerprint

Dive into the research topics of 'SylphDB: An Active and Adaptive LSM Engine for Update-Intensive Workloads'. Together they form a unique fingerprint.

Cite this