Federated Prototype Guided Adaption for Vision-Language Models

Youchao Liu, Dingjiang Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Federated Learning (FL) is a new pivotal paradigm for decentralized training on heterogeneous data. Recently fine-tuning of Vision-Language Models (VLMs) has been extended to the federated setting to improve overall performance. Unfortunately, in this case, FL still faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large VLMs. In this paper, we introduce FedPGA, a prototype-guided method, for achieving performance improvements in the federated setting for VLMs. Concretely, we design a prototype-based adapter for the vision-language model, CLIP. The lightweight adapter updates the prior knowledge encoded in CLIP to enhance its adaption capability further and avoid the effects of data distribution heterogeneity in the federated setting. Simultaneously, small-scale operations can mitigate the computational and communication burden caused by large VLMs. Our comprehensive empirical evaluations of nine diverse image classification datasets show that our method is superior to existing FL methods under VLMs.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • federated learning
  • few-shot classification
  • prototype-based adapter
  • vision-language model

Fingerprint

Dive into the research topics of 'Federated Prototype Guided Adaption for Vision-Language Models'. Together they form a unique fingerprint.

Cite this