跳到主要导航 跳到搜索 跳到主要内容

A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction

  • Bo Xu
  • , Shizhou Huang
  • , Ming Du
  • , Hongya Wang
  • , Hui Song
  • , Yanghua Xiao
  • , Xin Lin*
  • *此作品的通讯作者
  • Donghua University
  • Fudan University
  • Fudan-Aishu Cognitive Intelligence Joint Research Center

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Recently, multimodal information extraction has gained increasing attention in social media understanding, as it helps to accomplish the task of information extraction by adding images as auxiliary information to solve the ambiguity problem caused by insufficient semantic information in short texts. Despite their success, current methods do not take full advantage of the information provided by the diverse representations of images. To address this problem, we propose a novel unified visual prompt tuning framework with Mixture-of-Experts to fuse different types of image representations for multimodal information extraction. Extensive experiments conducted on two different multimodal information extraction tasks demonstrate the effectiveness of our method. The source code can be found at https://github.com/xubodhu/VisualPT-MoE.

源语言英语
主期刊名Database Systems for Advanced Applications - 28th International Conference, DASFAA 2023, Proceedings
编辑Xin Wang, Maria Luisa Sapino, Wook-Shin Han, Amr El Abbadi, Gill Dobbie, Zhiyong Feng, Yingxiao Shao, Hongzhi Yin
出版商Springer Science and Business Media Deutschland GmbH
544-554
页数11
ISBN(印刷版)9783031306747
DOI
出版状态已出版 - 2023
活动28th International Conference on Database Systems for Advanced Applications, DASFAA 2023 - Tianjin, 中国
期限: 17 4月 202320 4月 2023

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13945 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议28th International Conference on Database Systems for Advanced Applications, DASFAA 2023
国家/地区中国
Tianjin
时期17/04/2320/04/23

指纹

探究 'A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction' 的科研主题。它们共同构成独一无二的指纹。

引用此