跳到主要导航 跳到搜索 跳到主要内容

Efficient multimodal large language models: a survey

  • Yizhang Jin
  • , Jian Li
  • , Tianjun Gu
  • , Yexin Liu
  • , Bo Zhao
  • , Jinxiang Lai
  • , Zhenye Gan
  • , Yabiao Wang
  • , Chengjie Wang
  • , Xin Tan
  • , Lizhuang Ma*
  • *此作品的通讯作者
  • Shanghai Jiao Tong University
  • Tencent
  • East China Normal University
  • Beijing Academy of Artificial Intelligence
  • Hong Kong University of Science and Technology

科研成果: 期刊稿件文献综述同行评审

摘要

In the past years, multimodal large language models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering and visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, this survey summarizes the timeline of representative efficient MLLMs, the current state of research in structures and strategies, and the applications. Finally, the limitations of current efficient MLLM research and promising future directions are discussed.

源语言英语
文章编号27
期刊Visual Intelligence
3
1
DOI
出版状态已出版 - 12月 2025
已对外发布

指纹

探究 'Efficient multimodal large language models: a survey' 的科研主题。它们共同构成独一无二的指纹。

引用此