Skip to main navigation Skip to search Skip to main content

ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer

  • Yifan Xu
  • , Pourya Shamsolmoali*
  • , Masoume Zareapoor
  • , Jie Yang*
  • *Corresponding author for this work
  • Shanghai Jiao Tong University

Research output: Contribution to journalArticlepeer-review

Abstract

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.

Original languageEnglish
Pages (from-to)1038-1049
Number of pages12
JournalIEEE Transactions on Artificial Intelligence
Volume6
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Self-supervised
  • vision transformer
  • visual place recognition

Fingerprint

Dive into the research topics of 'ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer'. Together they form a unique fingerprint.

Cite this