Harvesting facts from textual web sources by constrained label propagation

  • Yafang Wang*
  • , Bin Yang
  • , Lizhen Qu
  • , Marc Spaniol
  • , Gerhard Weikum
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

51 Scopus citations

Abstract

There have been major advances on automatically constructing large knowledge bases by extracting relational facts from Web and text sources. However, the world is dynamic: periodic events like sports competitions need to be interpreted with their respective timepoints, and facts such as coaching a sports team, holding political or business positions, and even marriages do not hold forever and should be augmented by their respective timespans. This paper addresses the problem of automatically harvesting temporal facts with such extended time-awareness. We employ pattern-based gathering techniques for fact candidates and construct a weighted pattern-candidate graph. Our key contribution is a system called PRAVDA based on a new kind of label propagation algorithm with a judiciously designed loss function, which iteratively processes the graph to label good temporal facts for a given set of target relations. Our experiments with online news and Wikipedia articles demonstrate the accuracy of this method.

Original languageEnglish
Title of host publicationCIKM'11 - Proceedings of the 2011 ACM International Conference on Information and Knowledge Management
Pages837-846
Number of pages10
DOIs
StatePublished - 2011
Externally publishedYes
Event20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom
Duration: 24 Oct 201128 Oct 2011

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference20th ACM Conference on Information and Knowledge Management, CIKM'11
Country/TerritoryUnited Kingdom
CityGlasgow
Period24/10/1128/10/11

Keywords

  • knowledge harvesting
  • label propagation
  • temporal facts

Fingerprint

Dive into the research topics of 'Harvesting facts from textual web sources by constrained label propagation'. Together they form a unique fingerprint.

Cite this