Abstract
Distributed matrix computation is common in large-scale data processing and machine learning applications. Existing systems that support distributed matrix computation already explore incremental evaluation for iterative-convergent algorithms. However, they are oblivious to the fact that non-zero increments are scattered in different blocks in a distributed environment. Additionally, we observe that incremental evaluation does not always outperform full evaluation. To address these issues, we propose matrix reorganization to optimize the physical layout upon the state-of-art optimized partition schemes, and thereby accelerate the incremental evaluation. More importantly, we propose a hybrid evaluation to efficiently interleave full and incremental evaluation during the iterative process. In particular, it employs a cost model to compare the overhead costs of two types of evaluations and a selective comparison mechanism to reduce the overhead incurred by comparison itself. To demonstrate the efficiency of our techniques, we implement HyMAC, a hybrid matrix computation system based on SystemML. Our experiments show that HyMAC reduces execution time on large datasets by 23% on average in comparison to the state-of-art optimization technique and consequently outperforms SystemML, ScaLAPACK, and SciDB by an order of magnitude.
| Original language | English |
|---|---|
| Pages (from-to) | 300-312 |
| Number of pages | 13 |
| Journal | Proceedings of the ACM SIGMOD International Conference on Management of Data |
| DOIs | |
| State | Published - 2021 |
| Event | 2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China Duration: 20 Jun 2021 → 25 Jun 2021 |
Keywords
- hybrid evaluation
- iteration
- matrix computation