Abstract
Although real-time analytics on the up-to-date dataset has become an emerging demand, many big data systems are still designed for offline analytics. Particularly, for critical applications like Fintech, transactional data ingestion ensures a timely, always-correct, and scalable dataset. To carry out append-only ingestion, existing OLTP/HTAP systems are based on strict transactions with imperfect scalability, while NoSQL-like systems support scalable but relaxed transactions. How to ensure essential transactional guarantees without harming scalability seems to be a non-trivial issue. This paper proposes Karst to bring transactional data ingestion for existing offline analytics. We notice that blocking two-phase commit (2PC) to resolve transactional data ingestion is a performance killer for the partitioned analytical systems. Karst introduces a scalable protocol called metadata-oriented commit (MOC) that converts each distributed transaction into multiple partial transactions to avoid 2PC. Moreover, to ingest massive data into plenty of partitions, Karst also employs lazy persistence, lightweight logging, and optimized data traffic. In experiments, Karst could achieve up to about 2x$\sim$∼10x performance over relevant systems and also shows remarkable scalability.
| Original language | English |
|---|---|
| Pages (from-to) | 2241-2253 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 34 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2022 |
Keywords
- Data ingestion
- distributed transaction
- real-time analytics
- two-phase commit
Fingerprint
Dive into the research topics of 'Karst: Transactional Data Ingestion Without Blocking on a Scalable Architecture'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver