A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images

  • Jiachen Zhao
  • , Shizhou Huang
  • , Xin Lin*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Posts containing multiple images have significant research potential in Multimodal Named Entity Recognition nowadays. The previous methods determine whether the images are related to named entities in the text through similarity computation, such as using CLIP. However, it is not effective in some cases and not conducive to task transfer, especially in multi-image scenarios. To address the issue, we propose a graph interaction framework on relevance (GIFR) for Multimodal Named Entity Recognition with multiple images. For humans, they have the abilities to distinguish whether an image is relevant to named entities, but human capabilities are difficult to model. Therefore, we propose using reinforcement learning based on human preference to integrate human abilities into the model to determine whether an image-text pair is relevant, which is referred to as relevance. To better leverage relevance, we construct a heterogeneous graph and introduce graph transformer to enable information interaction. Experiments on benchmark datasets demonstrate that our method achieves the state-of-the-art performance.

Original languageEnglish
Title of host publicationMain Conference
EditorsOwen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
PublisherAssociation for Computational Linguistics (ACL)
Pages1237-1246
Number of pages10
ISBN (Electronic)9798891761964
StatePublished - 2025
Event31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: 19 Jan 202524 Jan 2025

Publication series

NameProceedings - International Conference on Computational Linguistics, COLING
ISSN (Print)2951-2093

Conference

Conference31st International Conference on Computational Linguistics, COLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period19/01/2524/01/25

Fingerprint

Dive into the research topics of 'A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images'. Together they form a unique fingerprint.

Cite this