TY - GEN
T1 - Federated Prototype Guided Adaption for Vision-Language Models
AU - Liu, Youchao
AU - Huang, Dingjiang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Federated Learning (FL) is a new pivotal paradigm for decentralized training on heterogeneous data. Recently fine-tuning of Vision-Language Models (VLMs) has been extended to the federated setting to improve overall performance. Unfortunately, in this case, FL still faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large VLMs. In this paper, we introduce FedPGA, a prototype-guided method, for achieving performance improvements in the federated setting for VLMs. Concretely, we design a prototype-based adapter for the vision-language model, CLIP. The lightweight adapter updates the prior knowledge encoded in CLIP to enhance its adaption capability further and avoid the effects of data distribution heterogeneity in the federated setting. Simultaneously, small-scale operations can mitigate the computational and communication burden caused by large VLMs. Our comprehensive empirical evaluations of nine diverse image classification datasets show that our method is superior to existing FL methods under VLMs.
AB - Federated Learning (FL) is a new pivotal paradigm for decentralized training on heterogeneous data. Recently fine-tuning of Vision-Language Models (VLMs) has been extended to the federated setting to improve overall performance. Unfortunately, in this case, FL still faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large VLMs. In this paper, we introduce FedPGA, a prototype-guided method, for achieving performance improvements in the federated setting for VLMs. Concretely, we design a prototype-based adapter for the vision-language model, CLIP. The lightweight adapter updates the prior knowledge encoded in CLIP to enhance its adaption capability further and avoid the effects of data distribution heterogeneity in the federated setting. Simultaneously, small-scale operations can mitigate the computational and communication burden caused by large VLMs. Our comprehensive empirical evaluations of nine diverse image classification datasets show that our method is superior to existing FL methods under VLMs.
KW - federated learning
KW - few-shot classification
KW - prototype-based adapter
KW - vision-language model
UR - https://www.scopus.com/pages/publications/105003888566
U2 - 10.1109/ICASSP49660.2025.10889303
DO - 10.1109/ICASSP49660.2025.10889303
M3 - 会议稿件
AN - SCOPUS:105003888566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -