Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

  • Shitian Zhao
  • , Renrui Zhang
  • , Xu Luo
  • , Yan Wang
  • , Shanghang Zhang
  • , Peng Gao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Model fusing has always been an important topic, especially in an era where large language models (LLM) and multi-modal language models (MLM) with different architectures, parameter sizes and training pipelines, are being created all the time.In this work, we propose a post-hoc framework, aiming at fusing heterogeneous models off-the-shell, which we call likelihood composition, and the basic idea is to compose multiple models' likelihood distribution when doing a multi-choice visual-question-answering task.Here the core concept, likelihood, is actually the log-probability of the candidate answer.In likelihood composition, we introduce some basic operations: debias, highlight, majority-vote and ensemble.By combining (composing) these basic elements, we get the mixed composition methods: mix-composition.Through conducting comprehensive experiments on 9 VQA datasets and 10 MLMs, we prove the effectiveness of mix-composition compared with simple ensemble or majority-vote methods.In this framework, people can propose new basic composition methods and combine them to get the new mixed composition methods.We hope our proposed likelihood composition can provide a new perspective of fusing heterogeneous models and inspire the exploration under this framework.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages10152-10163
Number of pages12
ISBN (Electronic)9798891761681
DOIs
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models'. Together they form a unique fingerprint.

Cite this