TY - JOUR
T1 - CaBaFL
T2 - Asynchronous Federated Learning via Hierarchical Cache and Feature Balance
AU - Xia, Zeke
AU - Hu, Ming
AU - Yan, Dengke
AU - Xie, Xiaofei
AU - Li, Tianlin
AU - Li, Anran
AU - Zhou, Junlong
AU - Chen, Mingsong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Federated learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical cache-based aggregation mechanism and a feature balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared to the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71% accuracy improvements.
AB - Federated learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical cache-based aggregation mechanism and a feature balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared to the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71% accuracy improvements.
KW - Artificial Intelligence of Things (AIoT)
KW - asynchronous federated learning (FL)
KW - data/device heterogeneity
KW - feature balance
UR - https://www.scopus.com/pages/publications/85209221575
U2 - 10.1109/TCAD.2024.3446881
DO - 10.1109/TCAD.2024.3446881
M3 - 文章
AN - SCOPUS:85209221575
SN - 0278-0070
VL - 43
SP - 4057
EP - 4068
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
ER -