TY - JOUR
T1 - Multi-modal multi-hop interaction network for dialogue response generation
AU - Zhou, Jie
AU - Tian, Junfeng
AU - Wang, Rui
AU - Wu, Yuanbin
AU - Yan, Ming
AU - He, Liang
AU - Huang, Xuanjing
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/10/1
Y1 - 2023/10/1
N2 - Most task-oriented dialogue systems generate informative and appropriate responses by leveraging structured knowledge bases which, in practise, are not always available. For instance, in the e-commerce scenario, commercial items often miss key attribute values, while containing abundant unstructured multi-modal information, e.g., text description and images. Previous studies have not fully explored such information for dialogue response generation. In this paper, we propose a Multi-modal multi-hop Interaction Network for Dialogue (MIND) to facilitate 1) the interaction between a query and multi-modal information through the query-aware multi-modal encoder and 2) the interaction between modalities through the multi-hop decoder. We conduct extensive experiments to demonstrate the effectiveness of MIND over strong baselines, which achieves state-of-the-art performance for automatic and human evaluation. We also release two real-world large-scale datasets containing both dialogue history and items’ multi-modal information to facilitate future research.
AB - Most task-oriented dialogue systems generate informative and appropriate responses by leveraging structured knowledge bases which, in practise, are not always available. For instance, in the e-commerce scenario, commercial items often miss key attribute values, while containing abundant unstructured multi-modal information, e.g., text description and images. Previous studies have not fully explored such information for dialogue response generation. In this paper, we propose a Multi-modal multi-hop Interaction Network for Dialogue (MIND) to facilitate 1) the interaction between a query and multi-modal information through the query-aware multi-modal encoder and 2) the interaction between modalities through the multi-hop decoder. We conduct extensive experiments to demonstrate the effectiveness of MIND over strong baselines, which achieves state-of-the-art performance for automatic and human evaluation. We also release two real-world large-scale datasets containing both dialogue history and items’ multi-modal information to facilitate future research.
KW - Dialogue response generation
KW - Interaction
KW - Multimodal
UR - https://www.scopus.com/pages/publications/85156276692
U2 - 10.1016/j.eswa.2023.120267
DO - 10.1016/j.eswa.2023.120267
M3 - 文章
AN - SCOPUS:85156276692
SN - 0957-4174
VL - 227
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 120267
ER -