Karst: Transactional Data Ingestion Without Blocking on a Scalable Architecture

  • Zhifang Li
  • , Beicheng Peng
  • , Qiuli Huang
  • , Chuliang Weng*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Although real-time analytics on the up-to-date dataset has become an emerging demand, many big data systems are still designed for offline analytics. Particularly, for critical applications like Fintech, transactional data ingestion ensures a timely, always-correct, and scalable dataset. To carry out append-only ingestion, existing OLTP/HTAP systems are based on strict transactions with imperfect scalability, while NoSQL-like systems support scalable but relaxed transactions. How to ensure essential transactional guarantees without harming scalability seems to be a non-trivial issue. This paper proposes Karst to bring transactional data ingestion for existing offline analytics. We notice that blocking two-phase commit (2PC) to resolve transactional data ingestion is a performance killer for the partitioned analytical systems. Karst introduces a scalable protocol called metadata-oriented commit (MOC) that converts each distributed transaction into multiple partial transactions to avoid 2PC. Moreover, to ingest massive data into plenty of partitions, Karst also employs lazy persistence, lightweight logging, and optimized data traffic. In experiments, Karst could achieve up to about 2x$\sim$∼10x performance over relevant systems and also shows remarkable scalability.

Original languageEnglish
Pages (from-to)2241-2253
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume34
Issue number5
DOIs
StatePublished - 1 May 2022

Keywords

  • Data ingestion
  • distributed transaction
  • real-time analytics
  • two-phase commit

Fingerprint

Dive into the research topics of 'Karst: Transactional Data Ingestion Without Blocking on a Scalable Architecture'. Together they form a unique fingerprint.

Cite this