AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Wei Zhu, Xiaoling Wang, Yuan Ni, Guotong Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc., so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
EditorsLu Wang, Yansong Feng, Yu Hong, Ruifang He
PublisherSpringer Science and Business Media Deutschland GmbH
Pages169-182
Number of pages14
ISBN (Print)9783030884796
DOIs
StatePublished - 2021
Event10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021 - Qingdao, China
Duration: 13 Oct 202117 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13028 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
Country/TerritoryChina
CityQingdao
Period13/10/2117/10/21

Keywords

  • Neural architecture search
  • Reinforcement learning
  • Transformer network

Fingerprint

Dive into the research topics of 'AutoTrans: Automating Transformer Design via Reinforced Architecture Search'. Together they form a unique fingerprint.

Cite this