摘要
Distributed matrix computation is common in large-scale data processing and machine learning applications. Existing systems that support distributed matrix computation already explore incremental evaluation for iterative-convergent algorithms. However, they are oblivious to the fact that non-zero increments are scattered in different blocks in a distributed environment. Additionally, we observe that incremental evaluation does not always outperform full evaluation. To address these issues, we propose matrix reorganization to optimize the physical layout upon the state-of-art optimized partition schemes, and thereby accelerate the incremental evaluation. More importantly, we propose a hybrid evaluation to efficiently interleave full and incremental evaluation during the iterative process. In particular, it employs a cost model to compare the overhead costs of two types of evaluations and a selective comparison mechanism to reduce the overhead incurred by comparison itself. To demonstrate the efficiency of our techniques, we implement HyMAC, a hybrid matrix computation system based on SystemML. Our experiments show that HyMAC reduces execution time on large datasets by 23% on average in comparison to the state-of-art optimization technique and consequently outperforms SystemML, ScaLAPACK, and SciDB by an order of magnitude.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 300-312 |
| 页数 | 13 |
| 期刊 | Proceedings of the ACM SIGMOD International Conference on Management of Data |
| DOI | |
| 出版状态 | 已出版 - 2021 |
| 活动 | 2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, 中国 期限: 20 6月 2021 → 25 6月 2021 |
指纹
探究 'Hybrid Evaluation for Distributed Iterative Matrix Computation' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver