TY - GEN
T1 - A Crowdsourcing Based Human-in-the-Loop Framework for Denoising UUs in Relation Extraction Tasks
AU - Li, Mengting
AU - Jin, Jian
AU - Wu, Wen
AU - Yang, Yan
AU - He, Liang
AU - Yang, Jing
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - In relation extraction tasks, distant supervision methods expand dataset by aligning entity pairs in different knowledge bases and completing the relations between two entities. However, these methods ignore the fact that sentences labels generated by distant supervision methods with high confidence are often incorrect in the real world called Unknown Unknowns (UUs). To deal with this challenge, we propose a crowdsourcing based human-in-the-loop denoising framework which iteratively discovers UUs and corrects them by crowdsourcing to better extract relations. During each epoch of iterations, we choose one sentence bag and repeat two steps: Firstly, attention based Long Short-Term Memory network is applied as a selector to discover potential UUs. Secondly, these UUs are annotated by crowdsourcing with two answer collecting strategies and fed back into selector as positive samples. Until the accuracy of selector reaches a threshold, all annotated samples are added into relation classifier as cleaned train set and framework moves on to next epoch with new sentence bags. The experiments on the New York Times dataset and analysis of potential UUs demonstrate that our framework denoise the dataset and outperforms all the baselines on distant supervision relation extraction tasks.
AB - In relation extraction tasks, distant supervision methods expand dataset by aligning entity pairs in different knowledge bases and completing the relations between two entities. However, these methods ignore the fact that sentences labels generated by distant supervision methods with high confidence are often incorrect in the real world called Unknown Unknowns (UUs). To deal with this challenge, we propose a crowdsourcing based human-in-the-loop denoising framework which iteratively discovers UUs and corrects them by crowdsourcing to better extract relations. During each epoch of iterations, we choose one sentence bag and repeat two steps: Firstly, attention based Long Short-Term Memory network is applied as a selector to discover potential UUs. Secondly, these UUs are annotated by crowdsourcing with two answer collecting strategies and fed back into selector as positive samples. Until the accuracy of selector reaches a threshold, all annotated samples are added into relation classifier as cleaned train set and framework moves on to next epoch with new sentence bags. The experiments on the New York Times dataset and analysis of potential UUs demonstrate that our framework denoise the dataset and outperforms all the baselines on distant supervision relation extraction tasks.
UR - https://www.scopus.com/pages/publications/85073232515
U2 - 10.1109/IJCNN.2019.8851951
DO - 10.1109/IJCNN.2019.8851951
M3 - 会议稿件
AN - SCOPUS:85073232515
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2019 International Joint Conference on Neural Networks, IJCNN 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Joint Conference on Neural Networks, IJCNN 2019
Y2 - 14 July 2019 through 19 July 2019
ER -