Homography Estimation With Adaptive Query Transformer and Gated Interaction Module

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Homography estimation is essential for aligning images captured from different viewpoints by accurately modeling the geometric relationship between them. In homography estimation, global information plays a critical role. To establish global correspondences, cross-attention has been widely used in recent studies. However, vanilla cross-attention mechanisms treat queries in redundant and low-texture areas the same as those in richly textured areas, leading to the accumulation and propagation of erroneous information. We define this phenomenon, where the model excessively attends to queries in redundant and low-texture areas, as query over-focusing. To alleviate query over-focusing and achieve fine-grained homography estimation, we propose a novel homography estimation network, termed AGNet, which integrates an Adaptive Query Transformer (AQFormer) and a Gated Interaction Module (GIM). The AQFormer is designed to dynamically adjust attention by applying a mask to queries, allowing the model to adaptively emphasize feature-rich regions while suppressing redundant or weakly textured areas. Meanwhile, the GIM selectively captures local information by adjusting convolutional kernels based on input, enhancing the extraction of shared features between image pairs. Extensive experiments on various datasets demonstrate that AGNet significantly improves accuracy in homography estimation, particularly in challenging scenarios with low overlap and large viewpoint variations.

Original languageEnglish
Pages (from-to)3342-3354
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Deep learning
  • geometry-enhanced
  • homography estimation
  • image alignment
  • transformer

Fingerprint

Dive into the research topics of 'Homography Estimation With Adaptive Query Transformer and Gated Interaction Module'. Together they form a unique fingerprint.

Cite this