跳到主要导航 跳到搜索 跳到主要内容

AutoTrans: Automating Transformer Design via Reinforced Architecture Search

  • Wei Zhu*
  • , Xiaoling Wang
  • , Yuan Ni
  • , Guotong Xie
  • *此作品的通讯作者
  • Ping An Healthcare Tech
  • Pingan Health Technology
  • Ping An International Smart City Technology Company Limited

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc., so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.

源语言英语
主期刊名Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
编辑Lu Wang, Yansong Feng, Yu Hong, Ruifang He
出版商Springer Science and Business Media Deutschland GmbH
169-182
页数14
ISBN(印刷版)9783030884796
DOI
出版状态已出版 - 2021
活动10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021 - Qingdao, 中国
期限: 13 10月 202117 10月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13028 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
国家/地区中国
Qingdao
时期13/10/2117/10/21

指纹

探究 'AutoTrans: Automating Transformer Design via Reinforced Architecture Search' 的科研主题。它们共同构成独一无二的指纹。

引用此