字音和字形能有效增强汉字的表示吗?-基于命名实体识别任务的验证

Translated title of the contribution: Can Phonetics and Orthography Effectively Enhance Chinese Character Representation?

Yufeng Duan, Meicong Zhang, Yanzuo Liu, Guoxiu He

Research output: Contribution to journalArticlepeer-review

Abstract

[Objective] This study aims to investigate the effectiveness of using phonetics and orthography features to enhance the representation of Chinese characters. [Methods] Based on the Named Entity Recognition (NER) task, we used a general embedding module, a bidirectional LSTM module, and a fully connected network with Softmax activation as the benchmark embedding layer, context encoding and decoding layers. Then, we compared the changes in Micro-F1 scores and entity-specific F1 scores after enhancing character embeddings with Chinese pinyin, images, Wubi input codes, Four-Corner codes, Cangjie codes, and radicals, using datasets such as MSRA, PeopleDaily, CCKS2017, Resume, and E-Commerce. [Results] Using phonetic and orthographic enhanced embeddings led to a performance decrease of nearly 0.01 in the MSRA and PeopleDaily datasets. At the same time, there was no statistically significant change in performance in the CCKS2017, Resume, and E-Commerce datasets. [Limitations] Using only 32×32 pixels images of Chinese simplified characters may affect the extraction of orthographic features. [Conclusions] While phonetic and orthographic features can enhance the representation of Chinese characters, they also introduce noise. They lead to varying impacts on model performance across different corpora and entities.

Translated title of the contributionCan Phonetics and Orthography Effectively Enhance Chinese Character Representation?
Original languageChinese (Traditional)
Pages (from-to)100-111
Number of pages12
JournalData Analysis and Knowledge Discovery
Volume8
Issue number10
DOIs
StatePublished - 25 Oct 2024

Fingerprint

Dive into the research topics of 'Can Phonetics and Orthography Effectively Enhance Chinese Character Representation?'. Together they form a unique fingerprint.

Cite this