TY - JOUR
T1 - HaloFL
T2 - Efficient Heterogeneity-Aware Federated Learning Through Optimal Submodel Extraction and Dynamic Sparse Adjustment
AU - Lian, Zirui
AU - Cao, Qianyue
AU - Liang, Chao
AU - Cao, Jing
AU - Zhu, Zongwei
AU - Yang, Zhi
AU - Ji, Cheng
AU - Li, Changlong
AU - Zhou, Xuehai
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Federated learning (FL) is an advanced framework that enables collaborative training of machine learning models across edge devices. An effective strategy to enhance training efficiency is to allocate the optimal submodel based on each device’s resource capabilities. However, system heterogeneity significantly increases the difficulty of allocating submodel parameter budgets appropriately for each device, leading to the straggler problem. Meanwhile, data heterogeneity complicates the selection of the optimal submodel structure for specific devices, thereby impacting training performance. Furthermore, the dynamic nature of edge environments, such as fluctuations in network communication and computational resources, exacerbates these challenges, making it even more difficult to precisely extract appropriately sized and structured submodels from the global model. To address the challenges in heterogeneous training environments, we propose an efficient FL framework, namely, HaloFL. The framework dynamically adjusts the structure and parameter budget of submodels during training by evaluating three dimensions: 1) model-wise performance; 2) layer-wise performance; and 3) unit-wise performance. First, we design a data-aware model unit importance evaluation method to determine the optimal submodel structure for different data distributions. Next, using this evaluation method, we analyze the importance of model layers and reallocate parameters from noncritical layers to critical layers within a fixed parameter budget, further optimizing the submodel structure. Finally, we introduce a resource-aware dual-UCB multiarmed bandit agent, which dynamically adjusts the total parameter budget of submodels according to changes in the training environment, allowing the framework to better adapt to the performance differences of heterogeneous devices. Experimental results demonstrate that HaloFL exhibits outstanding efficiency in various dynamic and heterogeneous scenarios, achieving up to a 14.80% improvement in accuracy and a 3.06× speedup compared to existing FL frameworks.
AB - Federated learning (FL) is an advanced framework that enables collaborative training of machine learning models across edge devices. An effective strategy to enhance training efficiency is to allocate the optimal submodel based on each device’s resource capabilities. However, system heterogeneity significantly increases the difficulty of allocating submodel parameter budgets appropriately for each device, leading to the straggler problem. Meanwhile, data heterogeneity complicates the selection of the optimal submodel structure for specific devices, thereby impacting training performance. Furthermore, the dynamic nature of edge environments, such as fluctuations in network communication and computational resources, exacerbates these challenges, making it even more difficult to precisely extract appropriately sized and structured submodels from the global model. To address the challenges in heterogeneous training environments, we propose an efficient FL framework, namely, HaloFL. The framework dynamically adjusts the structure and parameter budget of submodels during training by evaluating three dimensions: 1) model-wise performance; 2) layer-wise performance; and 3) unit-wise performance. First, we design a data-aware model unit importance evaluation method to determine the optimal submodel structure for different data distributions. Next, using this evaluation method, we analyze the importance of model layers and reallocate parameters from noncritical layers to critical layers within a fixed parameter budget, further optimizing the submodel structure. Finally, we introduce a resource-aware dual-UCB multiarmed bandit agent, which dynamically adjusts the total parameter budget of submodels according to changes in the training environment, allowing the framework to better adapt to the performance differences of heterogeneous devices. Experimental results demonstrate that HaloFL exhibits outstanding efficiency in various dynamic and heterogeneous scenarios, achieving up to a 14.80% improvement in accuracy and a 3.06× speedup compared to existing FL frameworks.
KW - Dynamic scenarios
KW - edge computing
KW - embedded systems
KW - federated learning (FL)
KW - heterogeneous training
UR - https://www.scopus.com/pages/publications/86000654880
U2 - 10.1109/TCAD.2025.3548003
DO - 10.1109/TCAD.2025.3548003
M3 - 文章
AN - SCOPUS:86000654880
SN - 0278-0070
VL - 44
SP - 3518
EP - 3531
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 9
ER -