跳到主要导航 跳到搜索 跳到主要内容

VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning

  • Zhihui Zhang
  • , Shiliang Sun*
  • , Jing Zhao*
  • , Tengfei Song
  • , Hao Yang
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai Jiao Tong University
  • Huawei Technologies Co., Ltd.

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multimodal machine translation (MMT) aims to enhance translation quality by integrating visual information. However, existing methods often extract visual features using pre-trained models while learning text features from scratch, leading to representation imbalance. These methods are also prone to being misled by redundant visual information, which results in suboptimal performance. To address these challenges, we propose CAMT, a novel cross-modal VQA-augmented MMT method. CAMT aligns image-source text pairs and image-question text pairs through dual-text contrastive learning, thereby improving semantic consistency across modalities. Additionally, we design an effective strategy for generating question–answer pairs to enhance fine-grained alignment and filter out irrelevant visual noise, while also addressing the scarcity of VQA annotations. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed CAMT framework, which consistently outperforms state-of-the-art MMT methods across multiple evaluation metrics.

源语言英语
主期刊名EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
编辑Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
出版商Association for Computational Linguistics (ACL)
10113-10124
页数12
ISBN(电子版)9798891763357
DOI
出版状态已出版 - 2025
活动30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, 中国
期限: 4 11月 20259 11月 2025

出版系列

姓名EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025

会议

会议30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
国家/地区中国
Suzhou
时期4/11/259/11/25

指纹

探究 'VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此