TY - GEN
T1 - Nebula
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
AU - Chen, Cen
AU - Wu, Bingzhe
AU - Wang, Li
AU - Chen, Chaochao
AU - Tan, Jin
AU - Wang, Lei
AU - Zhou, Jun
AU - Zhang, Benyu
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - With the rapid growth of data volume, data-driven machine learning models have become a necessary part of many industrial applications. Intuitively, the more high-quality data used for training leads to better model performance. However, in reality, data are usually scattered and isolated in different organizations or companies. Such a "data isolation" problem stimulates both academia and industry to explore the collaborative learning paradigm to build better models jointly with multiple data sources. Despite the potential performance gains, this learning paradigm inevitably faces privacy issues, especially for the Fintech domain where data are sensitive by nature. In this paper, we present a privacy-preserving collaborative learning system in Ant Financial, named Nebula. Our system aims to facilitate privacy-preserving collaborative model training for industrial-scale applications. Our system is built upon a ring-allreduce MPI based distributed framework. On top of that, with some optimization strategies and novel sharing scheme, our system is able to scale up to tens of millions of data samples with hundreds of thousands of features and achieve more than 100x speedup compared with the existing state-of-the-art implementations.
AB - With the rapid growth of data volume, data-driven machine learning models have become a necessary part of many industrial applications. Intuitively, the more high-quality data used for training leads to better model performance. However, in reality, data are usually scattered and isolated in different organizations or companies. Such a "data isolation" problem stimulates both academia and industry to explore the collaborative learning paradigm to build better models jointly with multiple data sources. Despite the potential performance gains, this learning paradigm inevitably faces privacy issues, especially for the Fintech domain where data are sensitive by nature. In this paper, we present a privacy-preserving collaborative learning system in Ant Financial, named Nebula. Our system aims to facilitate privacy-preserving collaborative model training for industrial-scale applications. Our system is built upon a ring-allreduce MPI based distributed framework. On top of that, with some optimization strategies and novel sharing scheme, our system is able to scale up to tens of millions of data samples with hundreds of thousands of features and achieve more than 100x speedup compared with the existing state-of-the-art implementations.
KW - collaborative learning
KW - privacy-preserving machine learning
UR - https://www.scopus.com/pages/publications/85095866216
U2 - 10.1145/3340531.3417418
DO - 10.1145/3340531.3417418
M3 - 会议稿件
AN - SCOPUS:85095866216
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 3369
EP - 3372
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 19 October 2020 through 23 October 2020
ER -