Abstract
Entity Resolution (ER) aims to identify and merge records that refer to the same entity across diverse data sources. Entity blocking is a key step in entity resolution, as it reduces computational complexity by efficiently generating candidate pairs to minimize redundant comparisons. Recent deep learning-based blocking methods show promise but often require large amounts of labeled data and struggle to capture fine-grained semantics. To address these challenges, we propose an unsupervised entity blocking framework based on contrastive learning with multi-granularity dynamic fusion. The framework consists of two stages: the embedding stage and the block generation stage. In the embedding stage, positive samples are created via data augmentation, with other instances in the batch serving as negatives. To enhance fine-grained semantics, the stage enables interactions among instance vectors and integrates global context through a multi-level similarity fusion mechanism. The fused representations are then used to fine-tune a pre-trained language model via contrastive learning. In the block generation stage, the fine-tuned model produces record embeddings, which are aggregated via average pooling. These aggregated embeddings are then used for efficient similarity computation and candidate ranking, ultimately generating high-quality candidate pairs. This framework effectively balances global semantics and local details, enabling accurate and efficient entity blocking without any labeled data. Experiments on real-world datasets demonstrate that the proposed UCL-Blocker consistently outperforms existing approaches, achieving a 3.92% higher Fα score than the current best blocking method Sudowoodo, verifying the effectiveness of the proposed framework.
| Original language | English |
|---|---|
| Article number | 114834 |
| Journal | Applied Soft Computing |
| Volume | 193 |
| DOIs | |
| State | Published - May 2026 |
Keywords
- Contrastive learning
- Data augmentation
- Dynamic fusion
- Unsupervised entity blocking
Fingerprint
Dive into the research topics of 'UCL-Blocker: Unsupervised contrastive learning with multi-granularity dynamic fusion for entity blocking'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver