Error link detection and correction in wikipedia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

The hyperlink structure of Wikipedia forms a rich semantic network connecting entities and concepts, enabling it as a valuable source for knowledge harvesting. Wikipedia, as crowd-sourced data, faces various data quality issues which significantly impacts knowledge systems depending on it as the information source. One such issue occurs when an anchor text in a Wikipage links to a wrong Wikipage, causing the error link problem. While much of previous work has focused on leveraging Wikipedia for entity linking, little has been done to detect error links. In this paper, we address the error link problem, and propose algorithms to detect and correct error links. We introduce an efficient method to generate candidate error links based on iterative ranking in an Anchor Text Semantic Network. This greatly reduces the problem space. A more accurate pairwise learning model was used to detect error links from the reduced candidate error link set, while suggesting correct links in the same time. This approach is effective when data sparsity is a challenging issue. The experiments on both English and Chinese Wikipedia illustrate the effectiveness of our approach. We also provide a preliminary analysis on possible causes of error links in English and Chinese Wikipedia.

Original languageEnglish
Title of host publicationCIKM 2016 - Proceedings of the 2016 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages307-316
Number of pages10
ISBN (Electronic)9781450340731
DOIs
StatePublished - 24 Oct 2016
Event25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States
Duration: 24 Oct 201628 Oct 2016

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
Volume24-28-October-2016

Conference

Conference25th ACM International Conference on Information and Knowledge Management, CIKM 2016
Country/TerritoryUnited States
CityIndianapolis
Period24/10/1628/10/16

Keywords

  • Error link
  • LinkRank
  • Pairwise learning
  • Wikipedia

Fingerprint

Dive into the research topics of 'Error link detection and correction in wikipedia'. Together they form a unique fingerprint.

Cite this