TY - GEN
T1 - Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification
AU - Zhuang, Hang
AU - Wang, Chao
AU - Li, Changlong
AU - Wang, Qingfeng
AU - Zhou, Xuehai
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/7
Y1 - 2017/9/7
N2 - With the development of deep learning and artificial intelligence, more and more research apply neural networks to natural language processing tasks. However, while the majority of these research take English corpus as the dataset, few studies have been done using Chinese corpus. Meanwhile, Existing Chinese processing algorithms typically regard Chinese word or Chinese character as the basic unit but ignore the deeper information into the Chinese character. In Chinese linguistic, strokes are the basic unit of Chinese character who are similar to letters of the English word. Inspired by the recent success of deep learning at character-level, we delve deeper to Chinese stroke level for Chinese language processing and developed it into service for Chinese text classification. In this paper, we dig the basic feature of the strokes considering the similar Chinese character components and propose a new method to leverage Chinese stroke for learning the continuous representation of Chinese character and develop it into a service for Chinese text classification. We develop a dedicated neural architecture based on the convolutional neural network to effectively learn character embedding and apply it to Chinese word similarity judgment and Chinese text classification. Both experiments results show that the stroke level method is effective for Chinese language processing.
AB - With the development of deep learning and artificial intelligence, more and more research apply neural networks to natural language processing tasks. However, while the majority of these research take English corpus as the dataset, few studies have been done using Chinese corpus. Meanwhile, Existing Chinese processing algorithms typically regard Chinese word or Chinese character as the basic unit but ignore the deeper information into the Chinese character. In Chinese linguistic, strokes are the basic unit of Chinese character who are similar to letters of the English word. Inspired by the recent success of deep learning at character-level, we delve deeper to Chinese stroke level for Chinese language processing and developed it into service for Chinese text classification. In this paper, we dig the basic feature of the strokes considering the similar Chinese character components and propose a new method to leverage Chinese stroke for learning the continuous representation of Chinese character and develop it into a service for Chinese text classification. We develop a dedicated neural architecture based on the convolutional neural network to effectively learn character embedding and apply it to Chinese word similarity judgment and Chinese text classification. Both experiments results show that the stroke level method is effective for Chinese language processing.
UR - https://www.scopus.com/pages/publications/85032332271
U2 - 10.1109/ICWS.2017.46
DO - 10.1109/ICWS.2017.46
M3 - 会议稿件
AN - SCOPUS:85032332271
T3 - Proceedings - 2017 IEEE 24th International Conference on Web Services, ICWS 2017
SP - 404
EP - 411
BT - Proceedings - 2017 IEEE 24th International Conference on Web Services, ICWS 2017
A2 - Chen, Shiping
A2 - Altintas, Ilkay
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE International Conference on Web Services, ICWS 2017
Y2 - 25 June 2017 through 30 June 2017
ER -