Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

  • Taolin Zhang
  • , Chengyu Wang
  • , Minghui Qiu
  • , Bite Yang
  • , Zerui Cai
  • , Xiaofeng He*
  • , Jun Huang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage, which has been widely studied recently especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset (more than 18k samples) is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose a Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL-IJCNLP 2021
EditorsChengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
PublisherAssociation for Computational Linguistics (ACL)
Pages2237-2249
Number of pages13
ISBN (Electronic)9781954085541
DOIs
StatePublished - 2021
EventFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021 - Virtual, Online
Duration: 1 Aug 20216 Aug 2021

Publication series

NameFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
CityVirtual, Online
Period1/08/216/08/21

Fingerprint

Dive into the research topics of 'Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources'. Together they form a unique fingerprint.

Cite this