TY - GEN
T1 - Aggregating rich deep semantic features for fine-grained place classification
AU - Wei, Tingyu
AU - Hu, Wenxin
AU - Wu, Xingjiao
AU - Zheng, Yingbin
AU - Ye, Hao
AU - Yang, Jing
AU - He, Liang
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - This paper proposes a method that aggregates rich deep semantic features for fine-grained place classification. As is known to all, the category of images depends on the objects and text as well as the various semantic regions, hierarchical structure, and spatial layout. However, most recently designed fine-grained classification systems ignored this, the complex multi-level semantic structure of images associated with fine-grained classes has not yet been well explored. Therefore, in this work, our approach composed of two modules: Content Estimator (CNE) and Context Estimator (CXE). CNE generates deep content features by encoding global visual cues of images. CXE obtains rich context features of images, and it consists of three children Estimator: Text Context Estimator (TCE), Object Context Estimator (OCE), and Scene Context Estimator (SCE). When inputting an image into CXE, TCE encodes text cues to identify word-level semantic information, OCE extracts high-dimensional feature then maps it to object semantic information, SCE gains hierarchical structure and spatial layout information by recognizing scene cues. To aggregate rich deep semantic features, we fuse the information about CNE and CXE for fine-grained classification. To the best of our knowledge, this is the first work to leverage the text information from an arbitrary-oriented scene text detector for extracting context information. Moreover, our method explores the fusion of semantic features and demonstrates scene features to give more complementary information with the other cues. Furthermore, the proposed approach achieves state-of-the-art performance on a fine-grained classification dataset, 84.3% on Con-Text.
AB - This paper proposes a method that aggregates rich deep semantic features for fine-grained place classification. As is known to all, the category of images depends on the objects and text as well as the various semantic regions, hierarchical structure, and spatial layout. However, most recently designed fine-grained classification systems ignored this, the complex multi-level semantic structure of images associated with fine-grained classes has not yet been well explored. Therefore, in this work, our approach composed of two modules: Content Estimator (CNE) and Context Estimator (CXE). CNE generates deep content features by encoding global visual cues of images. CXE obtains rich context features of images, and it consists of three children Estimator: Text Context Estimator (TCE), Object Context Estimator (OCE), and Scene Context Estimator (SCE). When inputting an image into CXE, TCE encodes text cues to identify word-level semantic information, OCE extracts high-dimensional feature then maps it to object semantic information, SCE gains hierarchical structure and spatial layout information by recognizing scene cues. To aggregate rich deep semantic features, we fuse the information about CNE and CXE for fine-grained classification. To the best of our knowledge, this is the first work to leverage the text information from an arbitrary-oriented scene text detector for extracting context information. Moreover, our method explores the fusion of semantic features and demonstrates scene features to give more complementary information with the other cues. Furthermore, the proposed approach achieves state-of-the-art performance on a fine-grained classification dataset, 84.3% on Con-Text.
KW - Fine-grained place classification
KW - Scene features
KW - Scene text detector
KW - Semantic features
UR - https://www.scopus.com/pages/publications/85072870030
U2 - 10.1007/978-3-030-30508-6_5
DO - 10.1007/978-3-030-30508-6_5
M3 - 会议稿件
AN - SCOPUS:85072870030
SN - 9783030305079
T3 - Lecture Notes in Computer Science
SP - 55
EP - 67
BT - Artificial Neural Networks and Machine Learning – ICANN 2019
A2 - Tetko, Igor V.
A2 - Karpov, Pavel
A2 - Theis, Fabian
A2 - Kurková, Vera
PB - Springer Verlag
T2 - 28th International Conference on Artificial Neural Networks: Workshop and Special Sessions, ICANN 2019
Y2 - 17 September 2019 through 19 September 2019
ER -