跳到主要导航 跳到搜索 跳到主要内容

Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation

  • Jiahao Li
  • , Yang Lu
  • , Yachao Zhang
  • , Fangyong Wang
  • , Yuan Xie*
  • , Yanyun Qu*
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Open-vocabulary semantic segmentation (OVSS) conducts pixel-level classification via text-driven alignment, where the domain discrepancy between base category training and open-vocabulary inference poses challenges in discriminative modeling of latent unseen category. To address this challenge, existing vision-language model (VLM)-based approaches demonstrate commendable performance through pre-trained multi-modal representations. However, the fundamental mechanisms of latent semantic comprehension remain underexplored, making the bottleneck for OVSS. In this work, we initiate a probing experiment to explore distribution patterns and dynamics of latent semantics in VLMs under inductive learning paradigms. Building on these insights, we propose X-Agent, an innovative OVSS framework employing latent semantic-aware ''agent'' to orchestrate cross-modal attention mechanisms, simultaneously optimizing latent semantic dynamic and amplifying its perceptibility. Extensive benchmark evaluations demonstrate that X-Agent achieves state-of-the-art performance while effectively enhancing the latent semantic saliency.

源语言英语
主期刊名MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
出版商Association for Computing Machinery, Inc
2929-2938
页数10
ISBN(电子版)9798400720352
DOI
出版状态已出版 - 27 10月 2025
活动33rd ACM International Conference on Multimedia, MM 2025 - Dublin, 爱尔兰
期限: 27 10月 202531 10月 2025

出版系列

姓名MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

会议

会议33rd ACM International Conference on Multimedia, MM 2025
国家/地区爱尔兰
Dublin
时期27/10/2531/10/25

指纹

探究 'Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此