TY - GEN
T1 - Large Language Model Judged Self-Training for Named Entity Recognition
AU - Chen, Shisong
AU - Wang, Jiaan
AU - Yang, Chengyi
AU - Xiao, Yanghua
AU - Li, Zhixu
AU - Lin, Xin
N1 - Publisher Copyright:
© 2026 Owner/Author.
PY - 2026/2/21
Y1 - 2026/2/21
N2 - Self-training for Named Entity Recognition (NER) aims at identifying named entities and their types in the text using self-training to fully make use of the limited labeled data and a large amount of unlabeled data. The major challenge in self-training is confirmation bias where incorrect pseudo-labels increase errors. Many efforts have been made to address this challenge, but few labeled data limit their performance. In this paper, we introduce Large Language Model (LLM) into self-training to select high-quality pseudo-labels leveraging its rich knowledge and few-shot learning capability. Specifically, we design a comprehensive prompt to improve the judgment performance of LLM, where the prompt incorporates task rules mined by LLM itself to fully leverage labeled data. In addition, to reduce the impact of LLM's hallucinations, we adopt a collaborative pseudo-label selection based on combined confidence and calibration-guided probability smoothing. Our empirical study conducted on several NER datasets shows that our method outperforms state-of-the-art approaches. The code is available at https://github.com/cheniison/llm-judged-ST.
AB - Self-training for Named Entity Recognition (NER) aims at identifying named entities and their types in the text using self-training to fully make use of the limited labeled data and a large amount of unlabeled data. The major challenge in self-training is confirmation bias where incorrect pseudo-labels increase errors. Many efforts have been made to address this challenge, but few labeled data limit their performance. In this paper, we introduce Large Language Model (LLM) into self-training to select high-quality pseudo-labels leveraging its rich knowledge and few-shot learning capability. Specifically, we design a comprehensive prompt to improve the judgment performance of LLM, where the prompt incorporates task rules mined by LLM itself to fully leverage labeled data. In addition, to reduce the impact of LLM's hallucinations, we adopt a collaborative pseudo-label selection based on combined confidence and calibration-guided probability smoothing. Our empirical study conducted on several NER datasets shows that our method outperforms state-of-the-art approaches. The code is available at https://github.com/cheniison/llm-judged-ST.
KW - large language model
KW - named entity recognition
KW - self-training
UR - https://www.scopus.com/pages/publications/105033158933
U2 - 10.1145/3773966.3778009
DO - 10.1145/3773966.3778009
M3 - 会议稿件
AN - SCOPUS:105033158933
T3 - WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
SP - 69
EP - 78
BT - WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 19th ACM International Conference on Web Search and Data Mining, WSDM 2026
Y2 - 22 February 2026 through 26 February 2026
ER -