HSNet: hierarchical semantics network for scene parsing

Xin Tan, Jiachen Xu, Ying Cao, Ke Xu, Lizhuang Ma, Rynson W.H. Lau

Research output: Contribution to journalArticlepeer-review

Abstract

Scene parsing is one of the fundamental tasks in computer vision. Humans tend to perceive a scene in a hierarchical manner, i.e., first identifying the coarse category (e.g., vehicle) of a group of objects and then the fine category (e.g., bicycle, truck or car) of each of them. Despite recent tremendous progress on scene parsing, such a hierarchical semantics prior (HSP) has not been explicitly exploited. In this paper, we aim to introduce the HSP into scene parsing, by proposing a hierarchical semantics network (HSNet). Our key contribution is a bidirectional cross-level feature matching framework, which enables us to learn multi-level, hierarchy-aware features via forward feature transfer and backward feature regularization. In the forward stage, we train a coarse-to-fine module to learn fine-category features that explicitly encode hierarchical semantics information. In the backward stage, we introduce a fine-to-coarse module to collapse fine-category features to coarse-category features that are used to regularize the feature learning of our network. Experimental results on Cityscapes and Pascal Context show that our method achieves state-of-the-art performances. Our visualization also shows that our learned features capture semantic hierarchy favorably.

Original languageEnglish
Pages (from-to)2543-2554
Number of pages12
JournalVisual Computer
Volume39
Issue number7
DOIs
StatePublished - Jul 2023
Externally publishedYes

Keywords

  • Bidirectional network
  • Cross-level feature
  • Hierarchical semantics
  • Scene parsing

Fingerprint

Dive into the research topics of 'HSNet: hierarchical semantics network for scene parsing'. Together they form a unique fingerprint.

Cite this