Exploring Cognitive and Aesthetic Causality for Multimodal Aspect-Based Sentiment Analysis

  • Luwei Xiao
  • , Rui Mao*
  • , Shuai Zhao
  • , Qika Lin
  • , Yanhao Jia
  • , Liang He
  • , Erik Cambria
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Multimodal aspect-based sentiment classification (MASC) is an emerging task due to an increase in user-generated multimodal content on social platforms, aimed at predicting sentiment polarity toward specific aspect targets (i.e., entities or attributes explicitly mentioned in text-image pairs). Despite extensive efforts and significant achievements in existing MASC, substantial gaps remain in understanding fine-grained visual content and the cognitive rationales derived from semantic content and impressions (cognitive interpretations of emotions evoked by image content). In this study, we present Chimera: a cognitive and aesthetic sentiment causality understanding framework to derive fine-grained holistic features of aspects and infer the fundamental drivers of sentiment expression from both semantic perspectives and affective-cognitive resonance (the synergistic effect between emotional responses and cognitive interpretations). The framework aligns visual patches with words, extracts coarse and fine-grained visual features, translates them into textual descriptions, and uses LLM-generated sentimental causes and impressions to boost sensitivity to affective cues. Experiments on MASC datasets show the model’s effectiveness and greater flexibility compared to LLMs like GPT-4o.

Original languageEnglish
Pages (from-to)3248-3265
Number of pages18
JournalIEEE Transactions on Affective Computing
Volume16
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Multimodal aspect-based sentiment classification (MASC)
  • affective-cognitive resonance
  • large language models
  • sentiment causality

Fingerprint

Dive into the research topics of 'Exploring Cognitive and Aesthetic Causality for Multimodal Aspect-Based Sentiment Analysis'. Together they form a unique fingerprint.

Cite this