TY - GEN
T1 - OD-DETR
T2 - 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
AU - Wu, Shengjian
AU - Sun, Li
AU - Li, Qingli
N1 - Publisher Copyright:
© 2024 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2024
Y1 - 2024
N2 - DEtection TRansformer (DETR) becomes a dominant paradigm, mainly due to its common architecture with high accuracy and no post-processing. However, DETR suffers from unstable training dynamics. It consumes more data and epochs to converge compared with CNN-based detectors. This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA), and distills its knowledge into the online model in following three aspects. First, the matching relation between object queries and ground truth (GT) boxes in the teacher is employed to guide the student, so queries within the student are not only assigned labels based on their own predictions, but also refer to the matching results from the teacher. Second, the teacher's initial query is given to the online student, and its prediction is directly constrained by the corresponding output from the teacher. Finally, the object queries from teacher's different decoder stages are used to build the auxiliary group to accelerate the convergence. For each GT, two queries with the least matching costs are selected into this extra group, and they predict the GT box and participate the optimization. Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.
AB - DEtection TRansformer (DETR) becomes a dominant paradigm, mainly due to its common architecture with high accuracy and no post-processing. However, DETR suffers from unstable training dynamics. It consumes more data and epochs to converge compared with CNN-based detectors. This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA), and distills its knowledge into the online model in following three aspects. First, the matching relation between object queries and ground truth (GT) boxes in the teacher is employed to guide the student, so queries within the student are not only assigned labels based on their own predictions, but also refer to the matching results from the teacher. Second, the teacher's initial query is given to the online student, and its prediction is directly constrained by the corresponding output from the teacher. Finally, the object queries from teacher's different decoder stages are used to build the auxiliary group to accelerate the convergence. For each GT, two queries with the least matching costs are selected into this extra group, and they predict the GT box and participate the optimization. Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.
UR - https://www.scopus.com/pages/publications/85204298220
M3 - 会议稿件
AN - SCOPUS:85204298220
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1443
EP - 1451
BT - Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
A2 - Larson, Kate
PB - International Joint Conferences on Artificial Intelligence
Y2 - 3 August 2024 through 9 August 2024
ER -