SEMI: A scalable entity matching system based on mapreduce

Pingfu Chao, Yuming Li, Zhu Gao, Junhua Fang, Xiaofeng He, Rong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

MapReduce framework provides a new platform for data integration on distributed environment. We demonstrate a MapReducebased entity resolution framework which efficiently solves the matching problem for structured, semi-structured and unstructured entities. We propose a random-based data representation method for reducing network transmission; we implement our design on MapReduce and design two solutions for reducing redundant comparisons. Our demo provides an easy-to-use platform for entity matching and performance analysis. We also compare the performance of our algorithm with the state-of-the-art blocking-based methods.

Original languageEnglish
Title of host publicationDatabases Theory and Applications - 26th Australasian Database Conference, ADC 2015, Proceedings
EditorsMuhammad Aamir Cheema, Jianzhong Qi, Mohamed A. Sharaf
PublisherSpringer Verlag
Pages328-332
Number of pages5
ISBN (Print)9783319195476
DOIs
StatePublished - 2015
Event26th Australasian Database Conference, ADC 2015 - Melbourne, Australia
Duration: 4 Jun 20157 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9093
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th Australasian Database Conference, ADC 2015
Country/TerritoryAustralia
CityMelbourne
Period4/06/157/06/15

Fingerprint

Dive into the research topics of 'SEMI: A scalable entity matching system based on mapreduce'. Together they form a unique fingerprint.

Cite this