Knowledge distillation with a precise teacher and prediction with abstention

Yi Xu, Jian Pu*, Hui Zhao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Knowledge distillation, which aims to train model under the supervision from another large model (teacher model) to the original model (student model), has achieved remarkable results in supervised learning. However, there are two major problems with existing knowledge distillation methods. One is the teacher's supervision is sometimes misleading, and the other is the student's prediction is not accurate enough. To address the first issue, instead of learning a combination of both teachers and ground truth, we apply knowledge adjustment to correct teachers' supervision using ground truth. For the second problem, we use the selective classification framework to train the student model. In particular, the deep gambler loss is adopted to predict with reservation by explicitly introducing the (m + 1)-th class. We consider two settings of knowledge distillation: (1) distillation across different network structures (AlexNet, ResNet), and (2) distillation across networks with different depths (ResNet18, ResNet50) to evaluate the effectiveness of our method. The experimental results on benchmark datasets (i.e., Fashion-MNIST, SVHN, CIFAR10, CIFAR100) are reported with higher prediction accuracies and lower coverage errors.

Original languageEnglish
Title of host publicationProceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9000-9006
Number of pages7
ISBN (Electronic)9781728188089
DOIs
StatePublished - 2020
Event25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy
Duration: 10 Jan 202115 Jan 2021

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Conference

Conference25th International Conference on Pattern Recognition, ICPR 2020
Country/TerritoryItaly
CityVirtual, Milan
Period10/01/2115/01/21

Fingerprint

Dive into the research topics of 'Knowledge distillation with a precise teacher and prediction with abstention'. Together they form a unique fingerprint.

Cite this