TY - JOUR
T1 - Karst
T2 - Transactional Data Ingestion Without Blocking on a Scalable Architecture
AU - Li, Zhifang
AU - Peng, Beicheng
AU - Huang, Qiuli
AU - Weng, Chuliang
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Although real-time analytics on the up-to-date dataset has become an emerging demand, many big data systems are still designed for offline analytics. Particularly, for critical applications like Fintech, transactional data ingestion ensures a timely, always-correct, and scalable dataset. To carry out append-only ingestion, existing OLTP/HTAP systems are based on strict transactions with imperfect scalability, while NoSQL-like systems support scalable but relaxed transactions. How to ensure essential transactional guarantees without harming scalability seems to be a non-trivial issue. This paper proposes Karst to bring transactional data ingestion for existing offline analytics. We notice that blocking two-phase commit (2PC) to resolve transactional data ingestion is a performance killer for the partitioned analytical systems. Karst introduces a scalable protocol called metadata-oriented commit (MOC) that converts each distributed transaction into multiple partial transactions to avoid 2PC. Moreover, to ingest massive data into plenty of partitions, Karst also employs lazy persistence, lightweight logging, and optimized data traffic. In experiments, Karst could achieve up to about 2x$\sim$∼10x performance over relevant systems and also shows remarkable scalability.
AB - Although real-time analytics on the up-to-date dataset has become an emerging demand, many big data systems are still designed for offline analytics. Particularly, for critical applications like Fintech, transactional data ingestion ensures a timely, always-correct, and scalable dataset. To carry out append-only ingestion, existing OLTP/HTAP systems are based on strict transactions with imperfect scalability, while NoSQL-like systems support scalable but relaxed transactions. How to ensure essential transactional guarantees without harming scalability seems to be a non-trivial issue. This paper proposes Karst to bring transactional data ingestion for existing offline analytics. We notice that blocking two-phase commit (2PC) to resolve transactional data ingestion is a performance killer for the partitioned analytical systems. Karst introduces a scalable protocol called metadata-oriented commit (MOC) that converts each distributed transaction into multiple partial transactions to avoid 2PC. Moreover, to ingest massive data into plenty of partitions, Karst also employs lazy persistence, lightweight logging, and optimized data traffic. In experiments, Karst could achieve up to about 2x$\sim$∼10x performance over relevant systems and also shows remarkable scalability.
KW - Data ingestion
KW - distributed transaction
KW - real-time analytics
KW - two-phase commit
UR - https://www.scopus.com/pages/publications/85128165301
U2 - 10.1109/TKDE.2020.3011510
DO - 10.1109/TKDE.2020.3011510
M3 - 文章
AN - SCOPUS:85128165301
SN - 1041-4347
VL - 34
SP - 2241
EP - 2253
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 5
ER -