TY - JOUR
T1 - Sentiment-aware multimodal pre-training for multimodal sentiment analysis
AU - Ye, Junjie
AU - Zhou, Jie
AU - Tian, Junfeng
AU - Wang, Rui
AU - Zhou, Jingyi
AU - Gui, Tao
AU - Zhang, Qi
AU - Huang, Xuanjing
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/12/22
Y1 - 2022/12/22
N2 - Pre-trained models, together with fine-tuning on downstream labeled datasets, have demonstrated great success in various tasks, including multimodal sentiment analysis. However, most most multimodal pre-trained models focus on learning general lexical and/or visual information, while ignoring sentiment signals. To address this problem, we propose a sentiment-aware multimodal pre-training (SMP) framework for multimodal sentiment analysis. In particular, we design a cross-modal contrastive learning module based on the interactions between visual and textual information, and introduce additional sentiment-aware pre-training objectives (e,g., fine-grained sentiment labeling) to capture fine-grained sentiment information from sentiment-rich datasets. We adopt two objectives (i.e., masked language modeling and masked auto-encoders) to capture semantic information from text and images. We conduct a series of experiments on sentence-level and target-oriented multimodal sentiment classification tasks, wherein the results of our SMP model exceeds the state-of-the-art results. Additionally, ablation studies and case studies are conducted to verify the effectiveness of our SMP model.
AB - Pre-trained models, together with fine-tuning on downstream labeled datasets, have demonstrated great success in various tasks, including multimodal sentiment analysis. However, most most multimodal pre-trained models focus on learning general lexical and/or visual information, while ignoring sentiment signals. To address this problem, we propose a sentiment-aware multimodal pre-training (SMP) framework for multimodal sentiment analysis. In particular, we design a cross-modal contrastive learning module based on the interactions between visual and textual information, and introduce additional sentiment-aware pre-training objectives (e,g., fine-grained sentiment labeling) to capture fine-grained sentiment information from sentiment-rich datasets. We adopt two objectives (i.e., masked language modeling and masked auto-encoders) to capture semantic information from text and images. We conduct a series of experiments on sentence-level and target-oriented multimodal sentiment classification tasks, wherein the results of our SMP model exceeds the state-of-the-art results. Additionally, ablation studies and case studies are conducted to verify the effectiveness of our SMP model.
KW - Multimodal
KW - Pretraining
KW - Sentiment analysis
UR - https://www.scopus.com/pages/publications/85142430234
U2 - 10.1016/j.knosys.2022.110021
DO - 10.1016/j.knosys.2022.110021
M3 - 文章
AN - SCOPUS:85142430234
SN - 0950-7051
VL - 258
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110021
ER -