跳到主要导航 跳到搜索 跳到主要内容

Entity matching across multiple heterogeneous data sources

  • East China Normal University
  • Technical University of Berlin

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Entity matching is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to entity profiling, product recommendation, etc. The matching process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we design an unsupervised approach, called EMAN, to match entities across two or more heterogeneous data sources. The algorithm utilizes the locality sensitive hashing schema to reduce the candidate tuples and speed up the matching process. To handle the heterogeneous entity attributes, we employ the exponential family to model the similarities between the different attributes. EMAN is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EMAN on re-identifying entities from the same data source, as well as matching entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.

源语言英语
主期刊名Database Systems for Advanced Applications - 21st International Conference, DASFAA 2016, Proceedings
编辑Shamkant B. Navathe, Weili Wu, Shashi Shekhar, Xiaoyong Du, Hui Xiong, X. Sean Wang
出版商Springer Verlag
133-146
页数14
ISBN(印刷版)9783319320243
DOI
出版状态已出版 - 2016
活动21st International Conference on Database Systems for Advanced Applications, DASFAA 2016 - Dallas, 美国
期限: 16 4月 201619 4月 2016

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9642
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议21st International Conference on Database Systems for Advanced Applications, DASFAA 2016
国家/地区美国
Dallas
时期16/04/1619/04/16

指纹

探究 'Entity matching across multiple heterogeneous data sources' 的科研主题。它们共同构成独一无二的指纹。

引用此