Cross-Stage Class-Specific Attention for Image Semantic Segmentation

Zhengyi Shi, Li Sun, Qingli Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Recent backbones built on transformers capture the context within a significantly larger area than CNN, and greatly improve the performance on semantic segmentation. However, the fact, that the decoder utilizes features from different stages in the shallow layers, indicates that local context is still important. Instead of simply incorporating features from different stages, we propose a cross-stage class-specific attention mainly for transformer-based backbones. Specifically, given a coarse prediction, we first employ the final stage features to aggregate a class center within the whole image. Then high-resolution features from the earlier stage are used as queries to absorb the semantics from class centers. To eliminate the irrelevant classes within a local area, we build the context for each query position according to the classification score from coarse prediction, and remove the redundant classes. So only relevant classes provide keys and values in attention and participate the value routing. We validate the proposed scheme on different datasets including ADE20K, Pascal Context and COCO-Stuff, showing that the proposed model improves the performance compared with other works.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
EditorsShiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages558-573
Number of pages16
ISBN (Print)9783031189159
DOIs
StatePublished - 2022
Event5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, China
Duration: 4 Nov 20227 Nov 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13537 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
Country/TerritoryChina
CityShenzhen
Period4/11/227/11/22

Keywords

  • Attention algorithm
  • Semantic segmentation
  • Vision transformer

Fingerprint

Dive into the research topics of 'Cross-Stage Class-Specific Attention for Image Semantic Segmentation'. Together they form a unique fingerprint.

Cite this