TY - GEN
T1 - ND-NER
T2 - 29th International Conference on Neural Information Processing, ICONIP 2022
AU - Li, Xinyan
AU - Li, Dongxu
AU - Yang, Zhihao
AU - Zhao, Hui
AU - Cai, Wei
AU - Lin, Xi
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2023
Y1 - 2023
N2 - The public data on the Internet contains a large amount of high-value open source intelligence (OSINT) for the national defense. As the fundamental information extraction task, Named Entity Recognition (NER) plays a key role in question answering systems, knowledge graphs and reasoning. However, NER for the national defense domain achieves little progress due to unavailable datasets. Most previous methods mainly work on general-purpose datasets which lack insight into the particularity of the national defense. In this paper, we propose a Chinese NER dataset, ND-NER, for the national defense based on the data crawled from Sina Weibo. This is the first public human-annotation NER dataset for OSINT towards the national defense domain with 19 entity types and 418,227 tokens. We construct two baseline tasks and implement a series of popular models on our dataset. The empirical results show that ND-NER is a challenging dataset concerning the long entities with the nest structure, domain specialization, ambiguous entity boundaries, informality and colloquialism issues of social media. We believe that the published ND-NER at https://github.com/XinyanLi2016/ND-NER will encourage further exploring for OSINT towards the national defense domain.
AB - The public data on the Internet contains a large amount of high-value open source intelligence (OSINT) for the national defense. As the fundamental information extraction task, Named Entity Recognition (NER) plays a key role in question answering systems, knowledge graphs and reasoning. However, NER for the national defense domain achieves little progress due to unavailable datasets. Most previous methods mainly work on general-purpose datasets which lack insight into the particularity of the national defense. In this paper, we propose a Chinese NER dataset, ND-NER, for the national defense based on the data crawled from Sina Weibo. This is the first public human-annotation NER dataset for OSINT towards the national defense domain with 19 entity types and 418,227 tokens. We construct two baseline tasks and implement a series of popular models on our dataset. The empirical results show that ND-NER is a challenging dataset concerning the long entities with the nest structure, domain specialization, ambiguous entity boundaries, informality and colloquialism issues of social media. We believe that the published ND-NER at https://github.com/XinyanLi2016/ND-NER will encourage further exploring for OSINT towards the national defense domain.
KW - Dataset
KW - Named Entity Recognition
KW - National Defense Domain
KW - Nested Named Entity Recognition
KW - Open Source Intelligence
UR - https://www.scopus.com/pages/publications/85161638317
U2 - 10.1007/978-981-99-1642-9_31
DO - 10.1007/978-981-99-1642-9_31
M3 - 会议稿件
AN - SCOPUS:85161638317
SN - 9789819916412
T3 - Communications in Computer and Information Science
SP - 361
EP - 372
BT - Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings
A2 - Tanveer, Mohammad
A2 - Agarwal, Sonali
A2 - Ozawa, Seiichi
A2 - Ekbal, Asif
A2 - Jatowt, Adam
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 22 November 2022 through 26 November 2022
ER -