Multi-modal multi-hop interaction network for dialogue response generation

Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Ming Yan, Liang He, Xuanjing Huang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Most task-oriented dialogue systems generate informative and appropriate responses by leveraging structured knowledge bases which, in practise, are not always available. For instance, in the e-commerce scenario, commercial items often miss key attribute values, while containing abundant unstructured multi-modal information, e.g., text description and images. Previous studies have not fully explored such information for dialogue response generation. In this paper, we propose a Multi-modal multi-hop Interaction Network for Dialogue (MIND) to facilitate 1) the interaction between a query and multi-modal information through the query-aware multi-modal encoder and 2) the interaction between modalities through the multi-hop decoder. We conduct extensive experiments to demonstrate the effectiveness of MIND over strong baselines, which achieves state-of-the-art performance for automatic and human evaluation. We also release two real-world large-scale datasets containing both dialogue history and items’ multi-modal information to facilitate future research.

Original languageEnglish
Article number120267
JournalExpert Systems with Applications
Volume227
DOIs
StatePublished - 1 Oct 2023

Keywords

  • Dialogue response generation
  • Interaction
  • Multimodal

Fingerprint

Dive into the research topics of 'Multi-modal multi-hop interaction network for dialogue response generation'. Together they form a unique fingerprint.

Cite this