跳到主要导航 跳到搜索 跳到主要内容

UCL-Blocker: Unsupervised contrastive learning with multi-granularity dynamic fusion for entity blocking

  • Yupeng Cao
  • , Niannian Shi
  • , Yaxin Wei
  • , Shumei Liu
  • , Chenchen Sun*
  • , Bin Yang
  • , Yisheng An
  • *此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Entity Resolution (ER) aims to identify and merge records that refer to the same entity across diverse data sources. Entity blocking is a key step in entity resolution, as it reduces computational complexity by efficiently generating candidate pairs to minimize redundant comparisons. Recent deep learning-based blocking methods show promise but often require large amounts of labeled data and struggle to capture fine-grained semantics. To address these challenges, we propose an unsupervised entity blocking framework based on contrastive learning with multi-granularity dynamic fusion. The framework consists of two stages: the embedding stage and the block generation stage. In the embedding stage, positive samples are created via data augmentation, with other instances in the batch serving as negatives. To enhance fine-grained semantics, the stage enables interactions among instance vectors and integrates global context through a multi-level similarity fusion mechanism. The fused representations are then used to fine-tune a pre-trained language model via contrastive learning. In the block generation stage, the fine-tuned model produces record embeddings, which are aggregated via average pooling. These aggregated embeddings are then used for efficient similarity computation and candidate ranking, ultimately generating high-quality candidate pairs. This framework effectively balances global semantics and local details, enabling accurate and efficient entity blocking without any labeled data. Experiments on real-world datasets demonstrate that the proposed UCL-Blocker consistently outperforms existing approaches, achieving a 3.92% higher Fα score than the current best blocking method Sudowoodo, verifying the effectiveness of the proposed framework.

源语言英语
文章编号114834
期刊Applied Soft Computing
193
DOI
出版状态已出版 - 5月 2026

指纹

探究 'UCL-Blocker: Unsupervised contrastive learning with multi-granularity dynamic fusion for entity blocking' 的科研主题。它们共同构成独一无二的指纹。

引用此