跳到主要导航 跳到搜索 跳到主要内容

PS-KD: PatchMix Simulation for High-Fidelity Knowledge Distillation

  • KJiazhen Xu
  • , Chong Wang*
  • , Sunqi Lin
  • , Yuqi Xie
  • , Jiangbo Qian
  • , Jiafei Wu
  • , Yuqi Li
  • *此作品的通讯作者
  • Ningbo University
  • The University of Aizu

科研成果: 期刊稿件文章同行评审

摘要

Knowledge Distillation (KD) is a widely used model compression technique that primarily transfers knowledge by aligning the predictions of a student model with those of a teacher model. Besides the traditional logit-based KD, combining data augmentation techniques, like MixUp, is another effective way to improve the distillation efficiency. However, as a powerful data augmentation method, PatchMix has shown limited effectiveness in CNN-based knowledge distillation. It is likely due to constraints in the CNN teacher’s receptive field and the absence of PatchMix-retrained teacher models. In this paper, we explore why PatchMix tends to be less effective than MixUp, and further introduce a novel framework called PatchMix Simulation Knowledge Distillation (PS-KD). The proposed new framework simulates a PatchMix-retrained teacher using an vanilla one to guide the student’s training, ensuring the high-fidelity information distillation in feature space. By revisiting the use of PatchMix in CNNs and reducing information distortion, our model is capable to enhance CNN’s spatial invariance and increase the fidelity of network representations. Extensive experiments demonstrate the superiority of our approach, enabling the network to identify discriminative regions in images with greater accuracy. The Code will be released soon.

源语言英语
期刊IEEE Transactions on Cognitive and Developmental Systems
DOI
出版状态已接受/待刊 - 2025
已对外发布

指纹

探究 'PS-KD: PatchMix Simulation for High-Fidelity Knowledge Distillation' 的科研主题。它们共同构成独一无二的指纹。

引用此