跳到主要导航 跳到搜索 跳到主要内容

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

  • Yiqun Wang
  • , Zhao Zhou
  • , Xiangcheng Du
  • , Xingjiao Wu
  • , Yingbin Zheng
  • , Cheng Jin*
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture. More specifically, we eliminate the modal differences in distribution and then use a modality-agnostic Transformer encoder for a semantic-level feature fusion. Our experiments demonstrate that MAA achieves state-of-the-art results on benchmarks by applying the same modalities with previous methods. Besides, it is worth mentioning that new modalities can be easily added when using MAA and further boost the performance.

源语言英语
主期刊名2024 IEEE International Conference on Multimedia and Expo, ICME 2024
出版商IEEE Computer Society
ISBN(电子版)9798350390155
DOI
出版状态已出版 - 2024
已对外发布
活动2024 IEEE International Conference on Multimedia and Expo, ICME 2024 - Niagra Falls, 加拿大
期限: 15 7月 202419 7月 2024

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2024 IEEE International Conference on Multimedia and Expo, ICME 2024
国家/地区加拿大
Niagra Falls
时期15/07/2419/07/24

指纹

探究 'Fine-Grained Scene Image Classification with Modality-Agnostic Adapter' 的科研主题。它们共同构成独一无二的指纹。

引用此