Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models

  • Zihao Chen
  • , Chenyang Zhang
  • , Chen Xu*
  • , Zhao Zhang
  • , Jiaqiang Wang
  • , Weining Qian
  • , Aoying Zhou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-layer Perceptron (MLP)-based models have been widely exploited by modern recommendation applications. In practice, industrial recommendation scenarios frequently launch continuous incremental training jobs with only one epoch to capture real-time user features. This kind of job is shorter than full training and has a larger proportion of feature processing time. To fully utilize fragmentation resources, our model engineering team at Tencent explores resource-constrained CPU clusters to perform such incremental training workloads. To improve the efficiency of such workloads, we notice scheduling optimizations by overlapping feature processing and model training at the level of data processing pipelines. In particular, we propose an intra-pipeline scheduling strategy, which prefetches feature processing operators dynamically to fill the idle time of CPUs during the communication of embedding lookup. Furthermore, we propose an inter-pipeline scheduling strategy, which balances the resource demands of different pipelines. It prioritizes the execution of critical pipelines and overlaps the communication in critical pipelines with the execution of non-critical pipelines. Based on the two scheduling strategies, we implement a novel incremental recommendation training framework called RECS on top of TensorFlow. In our experimental studies, RECS achieves a speedup of 1.36x over existing solutions on industrial workloads.

Original languageEnglish
Title of host publicationSIGMOD-Companion 2025 - Companion of the 2025 International Conference on Management of Data
EditorsAmol Deshpande, Ashraf Aboulnaga, Babak Salimi, Badrish Chandramouli, Bill Howe, Boon Thau Loo, Boris Glavic, Carlo Curino, Daisy Zhe Wang, Dan Suciu, Daniel Abadi, Divesh Srivastava, Eugene Wu, Faisal Nawab, Ihab Ilyas, Jeffrey Naughton, Jennie Rogers, Jignesh Patel, Joy Arulraj, Jun Yang, Karima Echihabi, Kenneth Ross, Khuzaima Daudjee, Laks Lakshmanan, Minos Garofalakis, Mirek Riedewald, Mohamed Mokbel, Mourad Ouzzani, Oliver Kennedy, Oliver Kennedy, Paolo Papotti, Peter Alvaro, Peter Bailis, Renee Miller, Senjuti Basu Roy, Sergey Melnik, Stratos Idreos, Sudeepa Roy, Theodoros Rekatsinas, Viktor Leis, Wenchao Zhou, Wolfgang Gatterbauer, Zack Ives
PublisherAssociation for Computing Machinery
Pages350-363
Number of pages14
ISBN (Electronic)9798400715648
DOIs
StatePublished - 22 Jun 2025
Event2025 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD-Companion 2025 - Berlin, Germany
Duration: 22 Jun 202527 Jun 2025

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2025 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD-Companion 2025
Country/TerritoryGermany
CityBerlin
Period22/06/2527/06/25

Keywords

  • data processing pipeline
  • incremental training
  • recommendation model
  • scheduling

Fingerprint

Dive into the research topics of 'Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models'. Together they form a unique fingerprint.

Cite this