TY - GEN
T1 - A novel self-attention based automatic code completion neural network
AU - Wang, Bohao
AU - Lv, Wanyou
AU - Shi, Jianqi
AU - Huang, Yanhong
N1 - Publisher Copyright:
© 2020 Knowledge Systems Institute Graduate School. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Code completion is one branch of source code modeling tasks. Using a deep learning method to implement it has explored the possibilities of modeling source code with a statistic language model. Recurrent Neural Network (RNN) is a universal feature extractor of Natural Language Processing (NLP), which is used in the code completion field commonly. However, RNN based models are lack of long-range context dependency and have a poor performance in training speed. Besides, some previous models have not handled the issue of out of vocabulary (OOV) well, which hinders further improvements in prediction accuracy. This paper presents a novel automatic code completion neural network, which is based on a self-attention mechanism with open vocabulary to address issues of OOV, slow training speed, and lacking long context-dependency. Experiments in this paper show that our model has a better performance of predicting tokens compared with the traditional N-gram model and RNN based model. In the meantime, we reduced training time significantly. More broadly, the combination of self-attention and open vocabulary has a potential application in the source code modeling field.
AB - Code completion is one branch of source code modeling tasks. Using a deep learning method to implement it has explored the possibilities of modeling source code with a statistic language model. Recurrent Neural Network (RNN) is a universal feature extractor of Natural Language Processing (NLP), which is used in the code completion field commonly. However, RNN based models are lack of long-range context dependency and have a poor performance in training speed. Besides, some previous models have not handled the issue of out of vocabulary (OOV) well, which hinders further improvements in prediction accuracy. This paper presents a novel automatic code completion neural network, which is based on a self-attention mechanism with open vocabulary to address issues of OOV, slow training speed, and lacking long context-dependency. Experiments in this paper show that our model has a better performance of predicting tokens compared with the traditional N-gram model and RNN based model. In the meantime, we reduced training time significantly. More broadly, the combination of self-attention and open vocabulary has a potential application in the source code modeling field.
KW - Code Completion
KW - Open Vocabulary
KW - Self-Attention
KW - Source Code Modeling
UR - https://www.scopus.com/pages/publications/85090507726
U2 - 10.18293/SEKE2020-056
DO - 10.18293/SEKE2020-056
M3 - 会议稿件
AN - SCOPUS:85090507726
T3 - Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE
SP - 386
EP - 391
BT - SEKE 2020 - Proceedings of the 32nd International Conference on Software Engineering and Knowledge Engineering
PB - Knowledge Systems Institute Graduate School
T2 - 32nd International Conference on Software Engineering and Knowledge Engineering, SEKE 2020
Y2 - 9 July 2020 through 19 July 2020
ER -