Skip to main navigation Skip to search Skip to main content

BADGE: Speeding Up BERT Inference after Deployment via Block-wise BypAsses and DiverGence-Based Early Exiting

  • Wei Zhu
  • , Peng Wang
  • , Yuan Ni
  • , Guotong Xie
  • , Xiaoling Wang*
  • *Corresponding author for this work
  • East China Normal University
  • Tomorrow Advancing Life
  • Pingan Health Tech

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Early exiting can reduce the average latency of pre-trained language models (PLMs) via its adaptive inference mechanism and work with other inference speed-up methods like model pruning, thus drawing much attention from the industry. In this work, we propose a novel framework, BADGE, which consists of two off-the-shelf methods for improving PLMs' early exiting. We frst address the issues of training a multi-exit PLM, the backbone model for early exiting. We propose the novel architecture of block-wise bypasses, which can alleviate the conficts in jointly training multiple intermediate classifers and thus improve the overall performances of multi-exit PLM while introducing negligible additional fops to the model. Second, we propose a novel divergence-based early exiting (DGE) mechanism, which obtains early exiting signals by comparing the predicted distributions among the current layer and the previous layers' exits. Extensive experiments on three proprietary datasets and three GLUE benchmark tasks demonstrate that our method can obtain a better speedup-performance tradeoff than the existing baseline methods.

Original languageEnglish
Title of host publicationIndustry Track
PublisherAssociation for Computational Linguistics (ACL)
Pages500-509
Number of pages10
ISBN (Electronic)9781959429685
DOIs
StatePublished - 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume5
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

Fingerprint

Dive into the research topics of 'BADGE: Speeding Up BERT Inference after Deployment via Block-wise BypAsses and DiverGence-Based Early Exiting'. Together they form a unique fingerprint.

Cite this