Aggregating rich deep semantic features for fine-grained place classification

  • Tingyu Wei
  • , Wenxin Hu*
  • , Xingjiao Wu
  • , Yingbin Zheng
  • , Hao Ye
  • , Jing Yang
  • , Liang He
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a method that aggregates rich deep semantic features for fine-grained place classification. As is known to all, the category of images depends on the objects and text as well as the various semantic regions, hierarchical structure, and spatial layout. However, most recently designed fine-grained classification systems ignored this, the complex multi-level semantic structure of images associated with fine-grained classes has not yet been well explored. Therefore, in this work, our approach composed of two modules: Content Estimator (CNE) and Context Estimator (CXE). CNE generates deep content features by encoding global visual cues of images. CXE obtains rich context features of images, and it consists of three children Estimator: Text Context Estimator (TCE), Object Context Estimator (OCE), and Scene Context Estimator (SCE). When inputting an image into CXE, TCE encodes text cues to identify word-level semantic information, OCE extracts high-dimensional feature then maps it to object semantic information, SCE gains hierarchical structure and spatial layout information by recognizing scene cues. To aggregate rich deep semantic features, we fuse the information about CNE and CXE for fine-grained classification. To the best of our knowledge, this is the first work to leverage the text information from an arbitrary-oriented scene text detector for extracting context information. Moreover, our method explores the fusion of semantic features and demonstrates scene features to give more complementary information with the other cues. Furthermore, the proposed approach achieves state-of-the-art performance on a fine-grained classification dataset, 84.3% on Con-Text.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2019
Subtitle of host publicationImage Processing - 28th International Conference on Artificial Neural Networks, 2019, Proceedings
EditorsIgor V. Tetko, Pavel Karpov, Fabian Theis, Vera Kurková
PublisherSpringer Verlag
Pages55-67
Number of pages13
ISBN (Print)9783030305079
DOIs
StatePublished - 2019
Event28th International Conference on Artificial Neural Networks: Workshop and Special Sessions, ICANN 2019 - Munich, Germany
Duration: 17 Sep 201919 Sep 2019

Publication series

NameLecture Notes in Computer Science
Volume11729 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Artificial Neural Networks: Workshop and Special Sessions, ICANN 2019
Country/TerritoryGermany
CityMunich
Period17/09/1919/09/19

Keywords

  • Fine-grained place classification
  • Scene features
  • Scene text detector
  • Semantic features

Fingerprint

Dive into the research topics of 'Aggregating rich deep semantic features for fine-grained place classification'. Together they form a unique fingerprint.

Cite this