TY - GEN
T1 - Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension
T2 - Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
AU - Zhang, Taolin
AU - Wang, Chengyu
AU - Qiu, Minghui
AU - Yang, Bite
AU - Cai, Zerui
AU - He, Xiaofeng
AU - Huang, Jun
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage, which has been widely studied recently especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset (more than 18k samples) is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose a Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
AB - Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage, which has been widely studied recently especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset (more than 18k samples) is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose a Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
UR - https://www.scopus.com/pages/publications/85123929603
U2 - 10.18653/v1/2021.findings-acl.197
DO - 10.18653/v1/2021.findings-acl.197
M3 - 会议稿件
AN - SCOPUS:85123929603
T3 - Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
SP - 2237
EP - 2249
BT - Findings of the Association for Computational Linguistics
A2 - Zong, Chengqing
A2 - Xia, Fei
A2 - Li, Wenjie
A2 - Navigli, Roberto
PB - Association for Computational Linguistics (ACL)
Y2 - 1 August 2021 through 6 August 2021
ER -