DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

  • Songwen Pei*
  • , Jiyao Wang
  • , Bingxue Zhang
  • , Wei Qin
  • , Hai Xue
  • , Xiaochun Ye
  • , Mingsong Chen
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The ever-increasing layers and hyper-parameters of deep neural network are continuously growing to generate large-scale network by training huge masses of data. However, it is difficult to deploy deep neural network on resource-constrained edge devices. Network mixed-precision quantization is a challenging way to prune and compress deep neural network models while discovering the optimal bit width for each layer. To solve the big challenge, we therefore propose the dynamic pseudo-mean mixed-precision quantization (DPQ) by introducing two-bit scaling factors to compensate errors of quantization. Furthermore, the activation quantization named random parameters clipping (RPC) is proposed. RPC adopts partial activation quantization to reduce loss of accuracy. Therefore, DPQ can dynamically adjust the bit precision of weight quantization according to the distribution of weights. It results in a quantification scheme with strong robustness compared to previous methods. Extensive experiments demonstrate that DPQ achieves 15.43× compression rate of ResNet20 on CIFAR-10 dataset with 0.22% increase in accuracy, and 35.25× compression rate of Resnet56 on SVHN dataset with 0.12% increase in accuracy.

Original languageEnglish
Pages (from-to)4099-4112
Number of pages14
JournalMachine Learning
Volume113
Issue number7
DOIs
StatePublished - Jul 2024

Keywords

  • Big data
  • Compression
  • Deep learning
  • Pruned neural network
  • Pseudo-mean mixed-precision quantization

Fingerprint

Dive into the research topics of 'DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network'. Together they form a unique fingerprint.

Cite this