跳到主要导航 跳到搜索 跳到主要内容

Large Language Model Judged Self-Training for Named Entity Recognition

  • Shisong Chen
  • , Jiaan Wang
  • , Chengyi Yang
  • , Yanghua Xiao*
  • , Zhixu Li*
  • , Xin Lin
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Self-training for Named Entity Recognition (NER) aims at identifying named entities and their types in the text using self-training to fully make use of the limited labeled data and a large amount of unlabeled data. The major challenge in self-training is confirmation bias where incorrect pseudo-labels increase errors. Many efforts have been made to address this challenge, but few labeled data limit their performance. In this paper, we introduce Large Language Model (LLM) into self-training to select high-quality pseudo-labels leveraging its rich knowledge and few-shot learning capability. Specifically, we design a comprehensive prompt to improve the judgment performance of LLM, where the prompt incorporates task rules mined by LLM itself to fully leverage labeled data. In addition, to reduce the impact of LLM's hallucinations, we adopt a collaborative pseudo-label selection based on combined confidence and calibration-guided probability smoothing. Our empirical study conducted on several NER datasets shows that our method outperforms state-of-the-art approaches. The code is available at https://github.com/cheniison/llm-judged-ST.

源语言英语
主期刊名WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
出版商Association for Computing Machinery, Inc
69-78
页数10
ISBN(电子版)9798400722929
DOI
出版状态已出版 - 21 2月 2026
活动19th ACM International Conference on Web Search and Data Mining, WSDM 2026 - Boise, 美国
期限: 22 2月 202626 2月 2026

出版系列

姓名WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining

会议

会议19th ACM International Conference on Web Search and Data Mining, WSDM 2026
国家/地区美国
Boise
时期22/02/2626/02/26

指纹

探究 'Large Language Model Judged Self-Training for Named Entity Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此