跳到主要导航 跳到搜索 跳到主要内容

Multimodal Machine Translation with Text-Image In-depth Questioning

  • Yue Gao
  • , Jing Zhao*
  • , Shiliang Sun*
  • , Xiaosong Qiao
  • , Tengfei Song
  • , Hao Yang
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai Jiao Tong University
  • Huawei Technologies Co., Ltd.

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multimodal machine translation (MMT) integrates visual information to address ambiguity and contextual limitations in neural machine translation (NMT). Some empirical studies have revealed that many MMT models underutilize visual data during translation. They attempt to enhance cross-modal interactions to enable better exploitation of visual data. However, they only focus on simple interactions between nouns in text and corresponding entities in image, overlooking global semantic alignment, particularly for prepositional phrases and verbs in text which are more likely to be translated incorrectly. To address this, we design a Text-Image In-depth Questioning method to deepen interactions and optimize translations. Furthermore, to mitigate errors arising from contextually irrelevant image noise, we propose a Consistency Constraint strategy to improve our approach's robustness. Our approach achieves state-of-the-art results on five translation directions of Multi30K and AmbigCaps, with +2.35 BLEU on the challenging MSCOCO benchmark, validating our method's effectiveness in utilizing visual data and capturing comprehensive textual semantics.

源语言英语
主期刊名Findings of the Association for Computational Linguistics
主期刊副标题ACL 2025
编辑Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
出版商Association for Computational Linguistics (ACL)
9274-9287
页数14
ISBN(电子版)9798891762565
DOI
出版状态已出版 - 2025
活动63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, 奥地利
期限: 27 7月 20251 8月 2025

出版系列

姓名Proceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN(印刷版)0736-587X

会议

会议63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
国家/地区奥地利
Vienna
时期27/07/251/08/25

指纹

探究 'Multimodal Machine Translation with Text-Image In-depth Questioning' 的科研主题。它们共同构成独一无二的指纹。

引用此