跳到主要导航 跳到搜索 跳到主要内容

Learning long-and short-Term user literal-preference with multimodal hierarchical transformer network for personalized image caption

  • Wei Zhang*
  • , Yue Ying
  • , Pan Lu
  • , Hongyuan Zha
  • *此作品的通讯作者
  • Shanghai AI Laboratory
  • East China Normal University
  • University of California at Los Angeles
  • Georgia Institute of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users writing style and traits, and is more practical to meet users real demands. Only a few recent studies shed light on this crucial task and learn static user representations to capture their long-Term literal-preference. However, it is insufficient to achieve satisfactory performance due to the intrinsic existence of not only long-Term user literal-preference, but also short-Term literal-preference which is associated with users recent states. To bridge this gap, we develop a novel multimodal hierarchical transformer network (MHTN) for personalized image caption in this paper. It learns short-Term user literal-preference based on users recent captions through a short-Term user encoder at the low level. And at the high level, the multimodal encoder integrates target image representations with short-Term literalpreference, as well as long-Term literal-preference learned from user IDs. These two encoders enjoy the advantages of the powerful transformer networks. Extensive experiments on two real datasets show the effectiveness of considering two types of user literal-preference simultaneously and better performance over the state-of-The-Art models.

源语言英语
主期刊名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
出版商AAAI press
9571-9578
页数8
ISBN(电子版)9781577358350
出版状态已出版 - 2020
活动34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, 美国
期限: 7 2月 202012 2月 2020

出版系列

姓名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

会议

会议34th AAAI Conference on Artificial Intelligence, AAAI 2020
国家/地区美国
New York
时期7/02/2012/02/20

指纹

探究 'Learning long-and short-Term user literal-preference with multimodal hierarchical transformer network for personalized image caption' 的科研主题。它们共同构成独一无二的指纹。

引用此