Efficient mapReduce-based method for massive entity matching

Pingfu Chao, Zhu Gao, Yuming Li, Junhua Fang, Rong Zhang, Aoying Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most of the state-of-the-art MapReduce-based entity matching methods inherit traditional Entity Resolution techniques on centralized system and focus on data blocking strategies in order to solve the load balancing problem occurred in distributed environment. In this paper, we propose a MapReduce-based entity matching framework for processing semi-structured and unstructured data. We use a Locality Sensitive Hash (LSH) function to generate low dimensional signatures for high dimensional entities; we introduce a series of random algorithms to ensure that similar signatures will be matched in reduce phase with high probability. Moreover, our framework contains a solution for reducing redundant similarity computation. Experiments show that our approach has a huge advantage on processing speed whilst keeps a high accuracy.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 16th International Conference, WAIM 2015, Proceedings
EditorsJian Li, Yizhou Sun
PublisherSpringer Verlag
Pages494-497
Number of pages4
ISBN (Electronic)9783319210414
DOIs
StatePublished - 2015
Event16th International Conference on Web-Age Information Management, WAIM 2015 - Qingdao, China
Duration: 8 Jun 201510 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9098
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Web-Age Information Management, WAIM 2015
Country/TerritoryChina
CityQingdao
Period8/06/1510/06/15

Fingerprint

Dive into the research topics of 'Efficient mapReduce-based method for massive entity matching'. Together they form a unique fingerprint.

Cite this