TY - JOUR
T1 - Chinese Language Processing Based on Stroke Representation and Multidimensional Representation
AU - Zhuang, Hang
AU - Wang, Chao
AU - Li, Changlong
AU - Li, Yijing
AU - Wang, Qingfeng
AU - Zhou, Xuehai
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2018/7/26
Y1 - 2018/7/26
N2 - With the development of deep learning and artificial intelligence, deep neural networks are increasingly being applied for natural language processing tasks. However, the majority of research on natural language processing focuses on alphabetic languages. Few studies have paid attention to the characteristics of ideographic languages, such as the Chinese language. In addition, the existing Chinese processing algorithms typically regard Chinese words or Chinese characters as the basic units while ignoring the information contained within the deeper architecture of Chinese characters. In the Chinese language, each Chinese character can be split into several components, or strokes. This means that strokes are the basic units of a Chinese character, in a manner similar to the letters of an English word. Inspired by the success of character-level neural networks, we delve deeper into Chinese writing at the stroke level for Chinese language processing. We extract the basic features of strokes by considering similar Chinese characters to learn a continuous representation of Chinese characters. Furthermore, word embeddings trained at different granularities are not exactly the same. In this paper, we propose an algorithm for combining different representations of Chinese words within a single neural network to obtain a better word representation. We develop a Chinese word representation service for several natural language processing tasks, and cloud computing is introduced to deal with preprocessing challenges and the training of basic representations from different dimensions.
AB - With the development of deep learning and artificial intelligence, deep neural networks are increasingly being applied for natural language processing tasks. However, the majority of research on natural language processing focuses on alphabetic languages. Few studies have paid attention to the characteristics of ideographic languages, such as the Chinese language. In addition, the existing Chinese processing algorithms typically regard Chinese words or Chinese characters as the basic units while ignoring the information contained within the deeper architecture of Chinese characters. In the Chinese language, each Chinese character can be split into several components, or strokes. This means that strokes are the basic units of a Chinese character, in a manner similar to the letters of an English word. Inspired by the success of character-level neural networks, we delve deeper into Chinese writing at the stroke level for Chinese language processing. We extract the basic features of strokes by considering similar Chinese characters to learn a continuous representation of Chinese characters. Furthermore, word embeddings trained at different granularities are not exactly the same. In this paper, we propose an algorithm for combining different representations of Chinese words within a single neural network to obtain a better word representation. We develop a Chinese word representation service for several natural language processing tasks, and cloud computing is introduced to deal with preprocessing challenges and the training of basic representations from different dimensions.
KW - Chinese word representation
KW - automatic text summarization
KW - convolutional neural networks
KW - multidimensional word representation
KW - natural language processing
KW - stroke-based word representation
KW - text classification
KW - word similarity
UR - https://www.scopus.com/pages/publications/85050764397
U2 - 10.1109/ACCESS.2018.2860058
DO - 10.1109/ACCESS.2018.2860058
M3 - 文章
AN - SCOPUS:85050764397
SN - 2169-3536
VL - 6
SP - 41928
EP - 41941
JO - IEEE Access
JF - IEEE Access
M1 - 8421226
ER -