RAGT: Learning Robust Features for Occluded Human Pose and Shape Estimation with Attention-Guided Transformer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

3D human pose and shape estimation from monocular images is a fundamental task in computer vision, but it is highly ill-posed and challenging due to occlusion. Occlusion can be caused by other objects that block parts of the body from being visible in the image. When an occlusion occurs, the image features become incomplete and ambiguous, leading to inaccurate or even wrong predictions. In this paper, we propose a novel method, named RAGT, that can handle occlusion robustly and recover the complete 3D pose and shape of humans. Our study focuses on achieving robust feature representation for human pose and shape estimation in the presence of occlusion. To this end, we introduce a dual-branch architecture that learns incorporation weights from visible parts to occluded parts and suppression weights to inhibit the integration of background features. To further improve the quality of visible and occluded maps, we leverage pseudo ground-truth maps generated by DensePose for pixel-level supervision. Additionally, we propose a novel transformer-based module called COAT (Contextual Occlusion-Aware Transformer) to effectively incorporate visible features into occluded regions. The COAT module is guided by an Occlusion-Guided Attention Loss (OGAL). OGAL is designed to explicitly encourage the COAT module to fuse more important and relevant features that are semantically and spatially closer to the occluded regions. We conduct experiments on various benchmarks and prove the robustness of RAGT to the different kinds of occluded scenes both quantitatively and qualitatively.

Original languageEnglish
Title of host publicationComputer-Aided Design and Computer Graphics - 18th International Conference, CAD/Graphics 2023, Proceedings
EditorsShi-Min Hu, Yiyu Cai, Paul Rosin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages329-347
Number of pages19
ISBN (Print)9789819996650
DOIs
StatePublished - 2024
Event18th International Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2023 - Shanghai, China
Duration: 19 Aug 202321 Aug 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14250 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2023
Country/TerritoryChina
CityShanghai
Period19/08/2321/08/23

Keywords

  • Human Pose and Shape Estimation
  • Human Reconstruction
  • Transformer

Fingerprint

Dive into the research topics of 'RAGT: Learning Robust Features for Occluded Human Pose and Shape Estimation with Attention-Guided Transformer'. Together they form a unique fingerprint.

Cite this