TY - GEN
T1 - Beamer
T2 - 30th ACM International Conference on Information and Knowledge Management, CIKM 2021
AU - Bi, Nifei
AU - Chen, Xiansen
AU - Xu, Chen
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/30
Y1 - 2021/10/30
N2 - Deep learning has made extraordinary progress in the last few years, focusing on improving the accuracy and speed of standard deep learning benchmarks. Nevertheless, datasets in production environments are often messy, which makes data cleaning crucial for DNN model training and inference. Existing solutions that combine big data processing systems and deep learning systems to accomplish the data cleaning, DNN model training and inference are internally tied to one of Spark or Flink. However, Spark and Flink usually show different performance under batch and stream processing workloads. In order to employ Spark in batch training and Flink in streaming inference, existing solutions incur the burden of maintaining two data cleaning programs. In this demonstration, we showcase Beamer: an end-to-end deep learning framework for unifying the data cleaning program when employing Spark in training and Flink in inference, respectively.
AB - Deep learning has made extraordinary progress in the last few years, focusing on improving the accuracy and speed of standard deep learning benchmarks. Nevertheless, datasets in production environments are often messy, which makes data cleaning crucial for DNN model training and inference. Existing solutions that combine big data processing systems and deep learning systems to accomplish the data cleaning, DNN model training and inference are internally tied to one of Spark or Flink. However, Spark and Flink usually show different performance under batch and stream processing workloads. In order to employ Spark in batch training and Flink in streaming inference, existing solutions incur the burden of maintaining two data cleaning programs. In this demonstration, we showcase Beamer: an end-to-end deep learning framework for unifying the data cleaning program when employing Spark in training and Flink in inference, respectively.
KW - batch training
KW - big data processing systems
KW - data cleaning
KW - deep learning systems
KW - streaming inference
UR - https://www.scopus.com/pages/publications/85119209108
U2 - 10.1145/3459637.3481977
DO - 10.1145/3459637.3481977
M3 - 会议稿件
AN - SCOPUS:85119209108
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 4685
EP - 4689
BT - CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 1 November 2021 through 5 November 2021
ER -