跳到主要导航 跳到搜索 跳到主要内容

Efficient mapReduce-based method for massive entity matching

  • Institute for Data Science and Engineering
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Most of the state-of-the-art MapReduce-based entity matching methods inherit traditional Entity Resolution techniques on centralized system and focus on data blocking strategies in order to solve the load balancing problem occurred in distributed environment. In this paper, we propose a MapReduce-based entity matching framework for processing semi-structured and unstructured data. We use a Locality Sensitive Hash (LSH) function to generate low dimensional signatures for high dimensional entities; we introduce a series of random algorithms to ensure that similar signatures will be matched in reduce phase with high probability. Moreover, our framework contains a solution for reducing redundant similarity computation. Experiments show that our approach has a huge advantage on processing speed whilst keeps a high accuracy.

源语言英语
主期刊名Web-Age Information Management - 16th International Conference, WAIM 2015, Proceedings
编辑Jian Li, Yizhou Sun
出版商Springer Verlag
494-497
页数4
ISBN(电子版)9783319210414
DOI
出版状态已出版 - 2015
活动16th International Conference on Web-Age Information Management, WAIM 2015 - Qingdao, 中国
期限: 8 6月 201510 6月 2015

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9098
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th International Conference on Web-Age Information Management, WAIM 2015
国家/地区中国
Qingdao
时期8/06/1510/06/15

指纹

探究 'Efficient mapReduce-based method for massive entity matching' 的科研主题。它们共同构成独一无二的指纹。

引用此