TY - GEN
T1 - Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
AU - Zhao, Shitian
AU - Zhang, Renrui
AU - Luo, Xu
AU - Wang, Yan
AU - Zhang, Shanghang
AU - Gao, Peng
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Model fusing has always been an important topic, especially in an era where large language models (LLM) and multi-modal language models (MLM) with different architectures, parameter sizes and training pipelines, are being created all the time.In this work, we propose a post-hoc framework, aiming at fusing heterogeneous models off-the-shell, which we call likelihood composition, and the basic idea is to compose multiple models' likelihood distribution when doing a multi-choice visual-question-answering task.Here the core concept, likelihood, is actually the log-probability of the candidate answer.In likelihood composition, we introduce some basic operations: debias, highlight, majority-vote and ensemble.By combining (composing) these basic elements, we get the mixed composition methods: mix-composition.Through conducting comprehensive experiments on 9 VQA datasets and 10 MLMs, we prove the effectiveness of mix-composition compared with simple ensemble or majority-vote methods.In this framework, people can propose new basic composition methods and combine them to get the new mixed composition methods.We hope our proposed likelihood composition can provide a new perspective of fusing heterogeneous models and inspire the exploration under this framework.
AB - Model fusing has always been an important topic, especially in an era where large language models (LLM) and multi-modal language models (MLM) with different architectures, parameter sizes and training pipelines, are being created all the time.In this work, we propose a post-hoc framework, aiming at fusing heterogeneous models off-the-shell, which we call likelihood composition, and the basic idea is to compose multiple models' likelihood distribution when doing a multi-choice visual-question-answering task.Here the core concept, likelihood, is actually the log-probability of the candidate answer.In likelihood composition, we introduce some basic operations: debias, highlight, majority-vote and ensemble.By combining (composing) these basic elements, we get the mixed composition methods: mix-composition.Through conducting comprehensive experiments on 9 VQA datasets and 10 MLMs, we prove the effectiveness of mix-composition compared with simple ensemble or majority-vote methods.In this framework, people can propose new basic composition methods and combine them to get the new mixed composition methods.We hope our proposed likelihood composition can provide a new perspective of fusing heterogeneous models and inspire the exploration under this framework.
UR - https://www.scopus.com/pages/publications/85217621569
U2 - 10.18653/v1/2024.findings-emnlp.594
DO - 10.18653/v1/2024.findings-emnlp.594
M3 - 会议稿件
AN - SCOPUS:85217621569
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
SP - 10152
EP - 10163
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
T2 - 2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Y2 - 12 November 2024 through 16 November 2024
ER -