TY - JOUR
T1 - Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation
AU - Chen, Lianggangxu
AU - Song, Youqi
AU - Cai, Yiqing
AU - Lu, Jiale
AU - Li, Yang
AU - Xie, Yuan
AU - Wang, Changbo
AU - He, Gaoqi
N1 - Publisher Copyright:
© 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - In the domain of scene graph generation, modeling commonsense as a single-prototype representation has been typically employed to facilitate the recognition of infrequent predicates. However, a fundamental challenge lies in the large intra-class variations of the visual appearance of predicates, resulting in subclasses within a predicate class. Such a challenge typically leads to the problem of misclassifying diverse predicates due to the rough predicate space clustering. In this paper, inspired by cognitive science, we maintain multi-prototype representations for each predicate class, which can accurately find the multiple class centers of the predicate space. Technically, we propose a novel multi-prototype learning framework consisting of three main steps: prototype-predicate matching, prototype updating, and prototype space optimization. We first design a triple-level optimal transport to match each predicate feature within the same class to a specific prototype. In addition, the prototypes are updated using momentum updating to find the class centers according to the matching results. Finally, we enhance the inter-class separability of the prototype space through iterations of the inter-class separability loss and intra-class compactness loss. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the Visual Genome dataset.
AB - In the domain of scene graph generation, modeling commonsense as a single-prototype representation has been typically employed to facilitate the recognition of infrequent predicates. However, a fundamental challenge lies in the large intra-class variations of the visual appearance of predicates, resulting in subclasses within a predicate class. Such a challenge typically leads to the problem of misclassifying diverse predicates due to the rough predicate space clustering. In this paper, inspired by cognitive science, we maintain multi-prototype representations for each predicate class, which can accurately find the multiple class centers of the predicate space. Technically, we propose a novel multi-prototype learning framework consisting of three main steps: prototype-predicate matching, prototype updating, and prototype space optimization. We first design a triple-level optimal transport to match each predicate feature within the same class to a specific prototype. In addition, the prototypes are updated using momentum updating to find the class centers according to the matching results. Finally, we enhance the inter-class separability of the prototype space through iterations of the inter-class separability loss and intra-class compactness loss. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the Visual Genome dataset.
UR - https://www.scopus.com/pages/publications/85189522005
U2 - 10.1609/aaai.v38i2.27874
DO - 10.1609/aaai.v38i2.27874
M3 - 会议文章
AN - SCOPUS:85189522005
SN - 2159-5399
VL - 38
SP - 1129
EP - 1137
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 2
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
Y2 - 20 February 2024 through 27 February 2024
ER -