A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction

Bo Xu, Shizhou Huang, Ming Du, Hongya Wang, Hui Song, Yanghua Xiao, Xin Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Recently, multimodal information extraction has gained increasing attention in social media understanding, as it helps to accomplish the task of information extraction by adding images as auxiliary information to solve the ambiguity problem caused by insufficient semantic information in short texts. Despite their success, current methods do not take full advantage of the information provided by the diverse representations of images. To address this problem, we propose a novel unified visual prompt tuning framework with Mixture-of-Experts to fuse different types of image representations for multimodal information extraction. Extensive experiments conducted on two different multimodal information extraction tasks demonstrate the effectiveness of our method. The source code can be found at https://github.com/xubodhu/VisualPT-MoE.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 28th International Conference, DASFAA 2023, Proceedings
EditorsXin Wang, Maria Luisa Sapino, Wook-Shin Han, Amr El Abbadi, Gill Dobbie, Zhiyong Feng, Yingxiao Shao, Hongzhi Yin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages544-554
Number of pages11
ISBN (Print)9783031306747
DOIs
StatePublished - 2023
Event28th International Conference on Database Systems for Advanced Applications, DASFAA 2023 - Tianjin, China
Duration: 17 Apr 202320 Apr 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13945 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Database Systems for Advanced Applications, DASFAA 2023
Country/TerritoryChina
CityTianjin
Period17/04/2320/04/23

Keywords

  • Mixture-of-Experts
  • Multimodal information extraction
  • Prompt learning
  • Social media

Fingerprint

Dive into the research topics of 'A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction'. Together they form a unique fingerprint.

Cite this