Skip to main navigation Skip to search Skip to main content

UCL-Blocker: Unsupervised contrastive learning with multi-granularity dynamic fusion for entity blocking

  • Yupeng Cao
  • , Niannian Shi
  • , Yaxin Wei
  • , Shumei Liu
  • , Chenchen Sun*
  • , Bin Yang
  • , Yisheng An
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Entity Resolution (ER) aims to identify and merge records that refer to the same entity across diverse data sources. Entity blocking is a key step in entity resolution, as it reduces computational complexity by efficiently generating candidate pairs to minimize redundant comparisons. Recent deep learning-based blocking methods show promise but often require large amounts of labeled data and struggle to capture fine-grained semantics. To address these challenges, we propose an unsupervised entity blocking framework based on contrastive learning with multi-granularity dynamic fusion. The framework consists of two stages: the embedding stage and the block generation stage. In the embedding stage, positive samples are created via data augmentation, with other instances in the batch serving as negatives. To enhance fine-grained semantics, the stage enables interactions among instance vectors and integrates global context through a multi-level similarity fusion mechanism. The fused representations are then used to fine-tune a pre-trained language model via contrastive learning. In the block generation stage, the fine-tuned model produces record embeddings, which are aggregated via average pooling. These aggregated embeddings are then used for efficient similarity computation and candidate ranking, ultimately generating high-quality candidate pairs. This framework effectively balances global semantics and local details, enabling accurate and efficient entity blocking without any labeled data. Experiments on real-world datasets demonstrate that the proposed UCL-Blocker consistently outperforms existing approaches, achieving a 3.92% higher Fα score than the current best blocking method Sudowoodo, verifying the effectiveness of the proposed framework.

Original languageEnglish
Article number114834
JournalApplied Soft Computing
Volume193
DOIs
StatePublished - May 2026

Keywords

  • Contrastive learning
  • Data augmentation
  • Dynamic fusion
  • Unsupervised entity blocking

Fingerprint

Dive into the research topics of 'UCL-Blocker: Unsupervised contrastive learning with multi-granularity dynamic fusion for entity blocking'. Together they form a unique fingerprint.

Cite this