TY - JOUR
T1 - BLB-gcForest
T2 - A High-Performance Distributed Deep Forest With Adaptive Sub-Forest Splitting
AU - Chen, Zexi
AU - Wang, Ting
AU - Cai, Haibin
AU - Mondal, Subrota Kumar
AU - Sahoo, Jyoti Prakash
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2022/11/1
Y1 - 2022/11/1
N2 - As an emulous alternative to deep neural networks, Deep Forest emerges with features like low complexity, fewer hyper-parameters, and good robustness, which are predominantly desired in distributed computing applications and ecosystems. Recently, an efficient distributed Deep Forest system, named ForestLayer, was proposed, designing a fine-grained sub-Forest-based task-parallel algorithm to improve the parallel computing efficiency of Deep Forest. However, the sub-Forest splitting of ForestLayer is static and one-off without adaptability to the computing environment, nevertheless, the size of splitting granularity has a significant impact on the system performance. To further improve the computing efficiency and scalability of the distributed Deep Forest, in this paper, we propose a novel distributed Deep Forest algorithm, named BLB-gcForest (Bag of Little Bootstraps-gcForest), which augments the gcForest (multi-Grained Cascade Forest) approach for constructing Deep Forest. BLB-gcForest carries out parallel computation for each tree in sub-Forests at a finer parallel granularity and integrates with the Bag of Little Bootstraps (BLB) mechanism to reduce massive transmitted feature instances for Cascade Forest Layers, utterly improving both computation efficiency and communication efficiency. Moreover, to solve the problem of the forest splitting granularity, we further design an adaptive sub-Forest splitting algorithm to ensure the maximum resource utilization for parallel computation of each sub-Forest. Experimental results on four well-known large-scale datasets, namely YEAST, LETTER, MNIST, CIFAR10, show that the training efficiency of BLB-gcForest achieves up to 20.3x and 1.64x
AB - As an emulous alternative to deep neural networks, Deep Forest emerges with features like low complexity, fewer hyper-parameters, and good robustness, which are predominantly desired in distributed computing applications and ecosystems. Recently, an efficient distributed Deep Forest system, named ForestLayer, was proposed, designing a fine-grained sub-Forest-based task-parallel algorithm to improve the parallel computing efficiency of Deep Forest. However, the sub-Forest splitting of ForestLayer is static and one-off without adaptability to the computing environment, nevertheless, the size of splitting granularity has a significant impact on the system performance. To further improve the computing efficiency and scalability of the distributed Deep Forest, in this paper, we propose a novel distributed Deep Forest algorithm, named BLB-gcForest (Bag of Little Bootstraps-gcForest), which augments the gcForest (multi-Grained Cascade Forest) approach for constructing Deep Forest. BLB-gcForest carries out parallel computation for each tree in sub-Forests at a finer parallel granularity and integrates with the Bag of Little Bootstraps (BLB) mechanism to reduce massive transmitted feature instances for Cascade Forest Layers, utterly improving both computation efficiency and communication efficiency. Moreover, to solve the problem of the forest splitting granularity, we further design an adaptive sub-Forest splitting algorithm to ensure the maximum resource utilization for parallel computation of each sub-Forest. Experimental results on four well-known large-scale datasets, namely YEAST, LETTER, MNIST, CIFAR10, show that the training efficiency of BLB-gcForest achieves up to 20.3x and 1.64x
KW - Deep forest
KW - big data bootstrap
KW - distributed AI
KW - distributed computing
UR - https://www.scopus.com/pages/publications/85121356295
U2 - 10.1109/TPDS.2021.3133544
DO - 10.1109/TPDS.2021.3133544
M3 - 文章
AN - SCOPUS:85121356295
SN - 1045-9219
VL - 33
SP - 3141
EP - 3152
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 11
ER -