TY - GEN
T1 - Redundancy Elimination in Distributed Matrix Computation
AU - Chen, Zihao
AU - Han, Baokun
AU - Xu, Chen
AU - Qian, Weining
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6
Y1 - 2022/6
N2 - As matrix computation becomes increasingly prevalent in large-scale data analysis, distributed matrix computation solutions have emerged. These solutions support query interfaces of linear algebra expressions, which often contain redundant subexpressions, i.e., common and loop-constant subexpressions. Hence, existing compilers rewrite queries to eliminate such redundancy. However, due to the large search space, they fail to find all redundant subexpressions, especially for matrix multiplication chains. Furthermore, redundancy elimination may change the original execution order of operators, and have negative impacts. To reduce the large search space and avoid the negative impacts, we propose automatic elimination and adaptive elimination, respectively. In particular, automatic elimination adopts a block-wise search that exploits the properties of matrix computation for speed-up. Adaptive elimination employs a cost model and a dynamic programming-based method to generate efficient plans for redundancy elimination. Finally, we implement ReMac atop SystemDS, eliminating redundancy in distributed matrix computation. In our experiments, ReMac is able to generate efficient execution plans at affordable overhead costs, and outperforms state-of-the-art solutions by an order of magnitude.
AB - As matrix computation becomes increasingly prevalent in large-scale data analysis, distributed matrix computation solutions have emerged. These solutions support query interfaces of linear algebra expressions, which often contain redundant subexpressions, i.e., common and loop-constant subexpressions. Hence, existing compilers rewrite queries to eliminate such redundancy. However, due to the large search space, they fail to find all redundant subexpressions, especially for matrix multiplication chains. Furthermore, redundancy elimination may change the original execution order of operators, and have negative impacts. To reduce the large search space and avoid the negative impacts, we propose automatic elimination and adaptive elimination, respectively. In particular, automatic elimination adopts a block-wise search that exploits the properties of matrix computation for speed-up. Adaptive elimination employs a cost model and a dynamic programming-based method to generate efficient plans for redundancy elimination. Finally, we implement ReMac atop SystemDS, eliminating redundancy in distributed matrix computation. In our experiments, ReMac is able to generate efficient execution plans at affordable overhead costs, and outperforms state-of-the-art solutions by an order of magnitude.
KW - distributed system
KW - matrix computation
KW - redundancy elimination
UR - https://www.scopus.com/pages/publications/85132692955
U2 - 10.1145/3514221.3517877
DO - 10.1145/3514221.3517877
M3 - 会议稿件
AN - SCOPUS:85132692955
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 573
EP - 586
BT - SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Y2 - 12 June 2022 through 17 June 2022
ER -