THANet: Transferring Human Pose Estimation to Animal Pose Estimation

  • Jincheng Liao
  • , Jianzhong Xu*
  • , Yunhang Shen
  • , Shaohui Lin*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Animal pose estimation (APE) boosts the understanding of animal behaviors. Recent vision-based APE has attracted extensive attention due to the advantages of contactless and sensorless applications. One of the main challenges in APE is the lack of high-quality keypoint annotations for different animal species since manually annotating the animal keypoints is very expensive and time-consuming. Existing works alleviate this problem by synthesizing APE data and generating pseudo-labels for unlabeled animal images. However, feature representations learned from synthetic images could not be directly transferred to real-world scenarios, and the generated pseudo-labels are usually noisy, which limits the model’s performance. To address the above challenge, we propose a novel cross-domain vision transformer for APE to Transfer Human pose estimation to Animal pose estimation, termed THANet, as humans share skeleton similarities with some animals. Inspired by the success of ViTPose in HPE, we design a unified vision transformer encoder to extract universal features for both animals and humans followed by two task-specific decoders. We further introduce a simple but effective cross-domain discriminator to bridge the domain gaps between the human pose and the animal pose. We evaluated the proposed THANet on the AP-10K and Animal-Pose benchmarks, and the extensive experiments show that our method achieves a promising performance. Specifically, the proposed vision transformer and cross-domain method significantly improve the model’s accuracy and generalization ability for APE.

Original languageEnglish
Article number4210
JournalElectronics (Switzerland)
Volume12
Issue number20
DOIs
StatePublished - Oct 2023

Keywords

  • animal pose estimation
  • cross-domain
  • vision transformer

Fingerprint

Dive into the research topics of 'THANet: Transferring Human Pose Estimation to Animal Pose Estimation'. Together they form a unique fingerprint.

Cite this