TY - GEN
T1 - A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis
AU - Huang, Shizhou
AU - Xu, Bo
AU - Li, Changqun
AU - Ye, Jiabo
AU - Lin, Xin
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/6/7
Y1 - 2024/6/7
N2 - Recently, multimodal sentiment analysis from social media posts has received increasing attention, as it can effectively improve single-modality-based sentiment analysis by leveraging the complementary information between text and images. Despite their success, current methods still suffer from two weaknesses: (1) the current methods for obtaining image representations do not obtain sentiment information, which leads to a significant gap between image representations and results; (2) the current methods ignore the sentiments expressed by the symbols (emoticons, emojis) in the text, but these symbols can effectively reflect the user’s sentiments. To address these issues, we propose a sentimental prompt framework with visual text encoder (SPFVTE). Specifically, for the first problem, instead of using the image representation directly, we project the image representation as a prompt and utilize the prompt learning to capture sentimental information in images by learning a sentiment-specific prompt. For the second problem, considering that people get the meanings of emojis and emoticons from their graphics, we propose to render the text as an image and use a visual text encoder to capture the sentiments contained in emojis and emoticons. We have conducted experiments on three public multimodal sentiment datasets, and the experimental results show that our method can significantly and consistently outperform the state-of-the-art methods. The datasets and source code can be found at https://github.com/JinFish/SPFVTE.
AB - Recently, multimodal sentiment analysis from social media posts has received increasing attention, as it can effectively improve single-modality-based sentiment analysis by leveraging the complementary information between text and images. Despite their success, current methods still suffer from two weaknesses: (1) the current methods for obtaining image representations do not obtain sentiment information, which leads to a significant gap between image representations and results; (2) the current methods ignore the sentiments expressed by the symbols (emoticons, emojis) in the text, but these symbols can effectively reflect the user’s sentiments. To address these issues, we propose a sentimental prompt framework with visual text encoder (SPFVTE). Specifically, for the first problem, instead of using the image representation directly, we project the image representation as a prompt and utilize the prompt learning to capture sentimental information in images by learning a sentiment-specific prompt. For the second problem, considering that people get the meanings of emojis and emoticons from their graphics, we propose to render the text as an image and use a visual text encoder to capture the sentiments contained in emojis and emoticons. We have conducted experiments on three public multimodal sentiment datasets, and the experimental results show that our method can significantly and consistently outperform the state-of-the-art methods. The datasets and source code can be found at https://github.com/JinFish/SPFVTE.
KW - multimodal fusion
KW - multimodal sentiment analysis
KW - social media posts
UR - https://www.scopus.com/pages/publications/85199136364
U2 - 10.1145/3652583.3658115
DO - 10.1145/3652583.3658115
M3 - 会议稿件
AN - SCOPUS:85199136364
T3 - ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval
SP - 638
EP - 646
BT - ICMR 2024-Proceedings of the 14th Annual ACM International Conference on Multimedia Retrieval
PB - Association for Computing Machinery, Inc
T2 - 14th Annual ACM International Conference on Multimedia Retrieval, ICMR 2024
Y2 - 10 June 2024 through 14 June 2024
ER -