TY - GEN
T1 - Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation
AU - Gong, Linrui
AU - Lin, Shaohui
AU - Zhang, Baochang
AU - Shen, Yunhang
AU - Li, Ke
AU - Qiao, Ruizhi
AU - Ren, Bo
AU - Li, Muqing
AU - Yu, Zhou
AU - Ma, Lizhuang
N1 - Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - Online Knowledge Distillation (OKD) is designed to alleviate the dilemma that the high-capacity pre-trained teacher model is not available. However, the existing methods mostly focus on improving the ensemble prediction accuracy from multiple students (a.k.a. branches), which often overlook the homogenization problem that makes student models saturate quickly and hurts performance. We assume that the intrinsic bottleneck of the homogenization problem comes from the identical branch architecture and coarse ensemble strategy. We propose a novel Adaptive Hierarchy-Branch Fusion framework for Online Knowledge Distillation, termed AHBF-OKD, which designs hierarchical branches and adaptive hierarchy-branch fusion module to boost the model diversity and learn complementary knowledge. Specifically, we first introduce hierarchical branch architectures to construct diverse peers by increasing the depth of branches monotonously on the basis of the target branch. To effectively transfer knowledge from the most complex branch to the simplest target branch, we propose an adaptive hierarchy-branch fusion module to create hierarchical teacher assistants recursively, which regards the target branch as the smallest teacher assistant. During the training, the teacher assistant from the previous hierarchy is explicitly distilled by the teacher assistant and the branch from the current hierarchy. Thus, the important scores to different branches are effectively and adaptively allocated to reduce branch homogenization. Extensive experiments demonstrate the effectiveness of AHBF-OKD on different datasets, including CIFAR-10/100 and ImageNet 2012. For example, the distilled ResNet18 achieves the Top-1 error of 29.28% on ImageNet 2012, which significantly outperforms the state-of-the-art methods. The source code is available at https://github.com/linruigong965/AHBF.
AB - Online Knowledge Distillation (OKD) is designed to alleviate the dilemma that the high-capacity pre-trained teacher model is not available. However, the existing methods mostly focus on improving the ensemble prediction accuracy from multiple students (a.k.a. branches), which often overlook the homogenization problem that makes student models saturate quickly and hurts performance. We assume that the intrinsic bottleneck of the homogenization problem comes from the identical branch architecture and coarse ensemble strategy. We propose a novel Adaptive Hierarchy-Branch Fusion framework for Online Knowledge Distillation, termed AHBF-OKD, which designs hierarchical branches and adaptive hierarchy-branch fusion module to boost the model diversity and learn complementary knowledge. Specifically, we first introduce hierarchical branch architectures to construct diverse peers by increasing the depth of branches monotonously on the basis of the target branch. To effectively transfer knowledge from the most complex branch to the simplest target branch, we propose an adaptive hierarchy-branch fusion module to create hierarchical teacher assistants recursively, which regards the target branch as the smallest teacher assistant. During the training, the teacher assistant from the previous hierarchy is explicitly distilled by the teacher assistant and the branch from the current hierarchy. Thus, the important scores to different branches are effectively and adaptively allocated to reduce branch homogenization. Extensive experiments demonstrate the effectiveness of AHBF-OKD on different datasets, including CIFAR-10/100 and ImageNet 2012. For example, the distilled ResNet18 achieves the Top-1 error of 29.28% on ImageNet 2012, which significantly outperforms the state-of-the-art methods. The source code is available at https://github.com/linruigong965/AHBF.
UR - https://www.scopus.com/pages/publications/85167979642
U2 - 10.1609/aaai.v37i6.25937
DO - 10.1609/aaai.v37i6.25937
M3 - 会议稿件
AN - SCOPUS:85167979642
T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
SP - 7731
EP - 7739
BT - AAAI-23 Technical Tracks 6
A2 - Williams, Brian
A2 - Chen, Yiling
A2 - Neville, Jennifer
PB - AAAI press
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Y2 - 7 February 2023 through 14 February 2023
ER -