TY - GEN
T1 - BADGE
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
AU - Zhu, Wei
AU - Wang, Peng
AU - Ni, Yuan
AU - Xie, Guotong
AU - Wang, Xiaoling
N1 - Publisher Copyright:
© ACL 2023.All rights reserved.
PY - 2023
Y1 - 2023
N2 - Early exiting can reduce the average latency of pre-trained language models (PLMs) via its adaptive inference mechanism and work with other inference speed-up methods like model pruning, thus drawing much attention from the industry. In this work, we propose a novel framework, BADGE, which consists of two off-the-shelf methods for improving PLMs' early exiting. We frst address the issues of training a multi-exit PLM, the backbone model for early exiting. We propose the novel architecture of block-wise bypasses, which can alleviate the conficts in jointly training multiple intermediate classifers and thus improve the overall performances of multi-exit PLM while introducing negligible additional fops to the model. Second, we propose a novel divergence-based early exiting (DGE) mechanism, which obtains early exiting signals by comparing the predicted distributions among the current layer and the previous layers' exits. Extensive experiments on three proprietary datasets and three GLUE benchmark tasks demonstrate that our method can obtain a better speedup-performance tradeoff than the existing baseline methods.
AB - Early exiting can reduce the average latency of pre-trained language models (PLMs) via its adaptive inference mechanism and work with other inference speed-up methods like model pruning, thus drawing much attention from the industry. In this work, we propose a novel framework, BADGE, which consists of two off-the-shelf methods for improving PLMs' early exiting. We frst address the issues of training a multi-exit PLM, the backbone model for early exiting. We propose the novel architecture of block-wise bypasses, which can alleviate the conficts in jointly training multiple intermediate classifers and thus improve the overall performances of multi-exit PLM while introducing negligible additional fops to the model. Second, we propose a novel divergence-based early exiting (DGE) mechanism, which obtains early exiting signals by comparing the predicted distributions among the current layer and the previous layers' exits. Extensive experiments on three proprietary datasets and three GLUE benchmark tasks demonstrate that our method can obtain a better speedup-performance tradeoff than the existing baseline methods.
UR - https://www.scopus.com/pages/publications/85174302090
U2 - 10.18653/v1/2023.acl-industry.48
DO - 10.18653/v1/2023.acl-industry.48
M3 - 会议稿件
AN - SCOPUS:85174302090
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 500
EP - 509
BT - Industry Track
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -