Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis

Luwei Xiao, Rui Mao, Xulang Zhang, Liang He, Erik Cambria

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Prevailing research concentrates on superficial features or descriptions of images, revealing a significant gap in the systematic exploration of their connotative and aesthetic attributes. Furthermore, the use of cross-modal relation detection modules to eliminate noise from comprehensive image representations leads to the omission of subtle contextual information. We present Vanessa, a visual connotation and aesthetic Attributes understanding network for multimodal aspect-based sentiment analysis. It incorporates a multi-aesthetic attributes aggregation (MA3) module that models intra- and inter-dependencies among bi-modal representations as well as emotion-laden aesthetic attributes. Moreover, we devise a self-supervised contrastive learning framework to explore the pairwise relevance between images and text via the Gaussian distribution of their CLIP scores. By dynamically clustering and merging multimodal tokens, Vanessa effectively captures both implicit and explicit sentimental cues. Extensive experiments on two widely adopted benchmarks verify Vanessa's effectiveness.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages11486-11500
Number of pages15
ISBN (Electronic)9798891761681
DOIs
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis'. Together they form a unique fingerprint.

Cite this