Skip to main navigation Skip to search Skip to main content

Learning Torso Prior for Co-Speech Gesture Generation with Better Hand Shape

  • Hexiang Wang
  • , Fengqi Liu
  • , Ran Yi*
  • , Lizhuang Ma*
  • *Corresponding author for this work
  • Shanghai Jiao Tong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Co-speech gesture generation is the task of synthesizing gesture sequences synchronized with an input audio signal. Previous methods try to estimate upper body gesture as a whole, ignoring the different mapping relations between audio and different body parts, which leads to poor overall results especially bad hand shapes. In this paper, we propose a novel three-branch co-speech gesture generation framework to obtain better results. In particular, we propose a Torso2Hand Prior Learning module (T2HPL) to leverage torso information as an extra prior to enhance hand pose prediction, and carefully design a hand shape discriminator to improve the authenticity of generated hand shape. In addition, an arm orientation loss is designed to encourage the network to generate torso part with better semantic expressiveness. Experiments on dataset of four different speakers demonstrate the superiority of our method over the state-of-the-art approaches.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PublisherIEEE Computer Society
Pages1-5
Number of pages5
ISBN (Electronic)9781728198354
DOIs
StatePublished - 2023
Event30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, Malaysia
Duration: 8 Oct 202311 Oct 2023

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference30th IEEE International Conference on Image Processing, ICIP 2023
Country/TerritoryMalaysia
CityKuala Lumpur
Period8/10/2311/10/23

Keywords

  • adversarial learning
  • co-speech gesture generation
  • cross-modal learning

Fingerprint

Dive into the research topics of 'Learning Torso Prior for Co-Speech Gesture Generation with Better Hand Shape'. Together they form a unique fingerprint.

Cite this