Sentiment-aware multimodal pre-training for multimodal sentiment analysis

  • Junjie Ye
  • , Jie Zhou*
  • , Junfeng Tian
  • , Rui Wang
  • , Jingyi Zhou
  • , Tao Gui
  • , Qi Zhang
  • , Xuanjing Huang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

73 Scopus citations

Abstract

Pre-trained models, together with fine-tuning on downstream labeled datasets, have demonstrated great success in various tasks, including multimodal sentiment analysis. However, most most multimodal pre-trained models focus on learning general lexical and/or visual information, while ignoring sentiment signals. To address this problem, we propose a sentiment-aware multimodal pre-training (SMP) framework for multimodal sentiment analysis. In particular, we design a cross-modal contrastive learning module based on the interactions between visual and textual information, and introduce additional sentiment-aware pre-training objectives (e,g., fine-grained sentiment labeling) to capture fine-grained sentiment information from sentiment-rich datasets. We adopt two objectives (i.e., masked language modeling and masked auto-encoders) to capture semantic information from text and images. We conduct a series of experiments on sentence-level and target-oriented multimodal sentiment classification tasks, wherein the results of our SMP model exceeds the state-of-the-art results. Additionally, ablation studies and case studies are conducted to verify the effectiveness of our SMP model.

Original languageEnglish
Article number110021
JournalKnowledge-Based Systems
Volume258
DOIs
StatePublished - 22 Dec 2022
Externally publishedYes

Keywords

  • Multimodal
  • Pretraining
  • Sentiment analysis

Fingerprint

Dive into the research topics of 'Sentiment-aware multimodal pre-training for multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this