ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain

  • Xinyan Li
  • , Dongxu Li
  • , Zhihao Yang
  • , Hui Zhao*
  • , Wei Cai
  • , Xi Lin
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The public data on the Internet contains a large amount of high-value open source intelligence (OSINT) for the national defense. As the fundamental information extraction task, Named Entity Recognition (NER) plays a key role in question answering systems, knowledge graphs and reasoning. However, NER for the national defense domain achieves little progress due to unavailable datasets. Most previous methods mainly work on general-purpose datasets which lack insight into the particularity of the national defense. In this paper, we propose a Chinese NER dataset, ND-NER, for the national defense based on the data crawled from Sina Weibo. This is the first public human-annotation NER dataset for OSINT towards the national defense domain with 19 entity types and 418,227 tokens. We construct two baseline tasks and implement a series of popular models on our dataset. The empirical results show that ND-NER is a challenging dataset concerning the long entities with the nest structure, domain specialization, ambiguous entity boundaries, informality and colloquialism issues of social media. We believe that the published ND-NER at https://github.com/XinyanLi2016/ND-NER will encourage further exploring for OSINT towards the national defense domain.

Original languageEnglish
Title of host publicationNeural Information Processing - 29th International Conference, ICONIP 2022, Proceedings
EditorsMohammad Tanveer, Sonali Agarwal, Seiichi Ozawa, Asif Ekbal, Adam Jatowt
PublisherSpringer Science and Business Media Deutschland GmbH
Pages361-372
Number of pages12
ISBN (Print)9789819916412
DOIs
StatePublished - 2023
Event29th International Conference on Neural Information Processing, ICONIP 2022 - Virtual, Online
Duration: 22 Nov 202226 Nov 2022

Publication series

NameCommunications in Computer and Information Science
Volume1792 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference29th International Conference on Neural Information Processing, ICONIP 2022
CityVirtual, Online
Period22/11/2226/11/22

Keywords

  • Dataset
  • Named Entity Recognition
  • National Defense Domain
  • Nested Named Entity Recognition
  • Open Source Intelligence

Fingerprint

Dive into the research topics of 'ND-NER: A Named Entity Recognition Dataset for OSINT Towards the National Defense Domain'. Together they form a unique fingerprint.

Cite this