跳到主要导航 跳到搜索 跳到主要内容

Modeling Intra- and Inter-Modal Alignment with Optimal Transport for Visual Dialog

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Visual dialog aims to address a sequence of questions by effectively reasoning over both the dialog history and image content. While existing methods primarily focus on devising various attention mechanisms to capture interactions between different modalities, explicit signals encouraging semantic alignment in the visual dialog are seldom utilized. In this paper, we present a novel approach that leverages Optimal Transport to provide explicit and interpretable training signals to guide intra- and inter-modal alignment for the text and image in the visual dialog. Specifically, our approach consists of two kinds of alignment modules, Word-Word Alignment (WWA) and Region-Word Alignment (RWA). The WWA module learns latent relationships between a given question and a dialog history to align different concepts or pronouns that represent the same entity. As for the RWA module, it models the internal structures of text and images with graphs and performs graph matching for region-word alignment. We perform experiments on the benchmark dataset Visdial v1.0, and the experimental results show that our proposed approach achieves new state-of-the-art performance with respect to most metrics.

源语言英语
主期刊名Proceedings - 2023 IEEE 35th International Conference on Tools with Artificial Intelligence, ICTAI 2023
出版商IEEE Computer Society
805-812
页数8
ISBN(电子版)9798350342734
DOI
出版状态已出版 - 2023
活动35th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2023 - Atlanta, 美国
期限: 6 11月 20238 11月 2023

出版系列

姓名Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
ISSN(印刷版)1082-3409

会议

会议35th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2023
国家/地区美国
Atlanta
时期6/11/238/11/23

指纹

探究 'Modeling Intra- and Inter-Modal Alignment with Optimal Transport for Visual Dialog' 的科研主题。它们共同构成独一无二的指纹。

引用此