TY - GEN
T1 - ByteCard
T2 - 2024 International Conference on Management of Data, SIGMOD 2024
AU - Han, Yuxing
AU - Wang, Haoyu
AU - Chen, Lixiang
AU - Dong, Yifeng
AU - Chen, Xing
AU - Yu, Benquan
AU - Yang, Chengcheng
AU - Qian, Weining
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/6/9
Y1 - 2024/6/9
N2 - Cardinality estimation is a critical component and a longstanding challenge in modern data warehouses. ByteHouse, ByteDance's cloud-native engine for extensive data analysis in exabyte-scale environments, serves numerous internal decision-making business scenarios. With the increasing demand for ByteHouse, cardinality estimation becomes the bottleneck for efficiently processing queries. Specifically, the existing query optimizer of ByteHouse uses the traditional Selinger-like cardinality estimator, which can produce substantial estimation errors, resulting in suboptimal query plans. To improve cardinality estimation accuracy while maintaining a practical inference overhead, we develop a framework ByteCard that enables efficient training and integration of learned cardinality estimators. Furthermore, ByteCard adapts recent advances in cardinality estimation to build models that can balance accuracy and practicality (e.g., inference latency, model size, training overhead). We observe significant query processing speed-up in ByteHouse after replacing the existing cardinality estimator with ByteCard for several optimization scenarios. Evaluations on real-world datasets show the integration of ByteCard leads to an improvement of up to 30% in the 99th quantile of latency. At last, we share our valuable experience in engineering advanced cardinality estimators. This experience can help ByteHouse integrate more learning-based solutions on the critical query execution path in the future.
AB - Cardinality estimation is a critical component and a longstanding challenge in modern data warehouses. ByteHouse, ByteDance's cloud-native engine for extensive data analysis in exabyte-scale environments, serves numerous internal decision-making business scenarios. With the increasing demand for ByteHouse, cardinality estimation becomes the bottleneck for efficiently processing queries. Specifically, the existing query optimizer of ByteHouse uses the traditional Selinger-like cardinality estimator, which can produce substantial estimation errors, resulting in suboptimal query plans. To improve cardinality estimation accuracy while maintaining a practical inference overhead, we develop a framework ByteCard that enables efficient training and integration of learned cardinality estimators. Furthermore, ByteCard adapts recent advances in cardinality estimation to build models that can balance accuracy and practicality (e.g., inference latency, model size, training overhead). We observe significant query processing speed-up in ByteHouse after replacing the existing cardinality estimator with ByteCard for several optimization scenarios. Evaluations on real-world datasets show the integration of ByteCard leads to an improvement of up to 30% in the 99th quantile of latency. At last, we share our valuable experience in engineering advanced cardinality estimators. This experience can help ByteHouse integrate more learning-based solutions on the critical query execution path in the future.
KW - bytecard
KW - bytehouse
KW - inference engine
KW - learned cardinality estimators
KW - modelforge service
KW - query optimization
UR - https://www.scopus.com/pages/publications/85196400410
U2 - 10.1145/3626246.3653376
DO - 10.1145/3626246.3653376
M3 - 会议稿件
AN - SCOPUS:85196400410
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 41
EP - 54
BT - SIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data
PB - Association for Computing Machinery
Y2 - 9 June 2024 through 15 June 2024
ER -