EmoDiffGes: Emotion-Aware Co-Speech Holistic Gesture Generation with Progressive Synergistic Diffusion

Xinru Li, Jingzhong Lin, Bohao Zhang, Yuanyuan Qi, Changbo Wang, Gaoqi He

Research output: Contribution to journalArticlepeer-review

Abstract

Co-speech gesture generation, driven by emotional expression and synergistic bodily movements, is essential for applications such as virtual avatars and human-robot interaction. Existing co-speech gesture generation methods face two fundamental limitations: (1) producing inexpressive gestures due to ignoring the temporal evolution of emotion; (2) generating incoherent and unnatural motions as a result of either holistic body oversimplification or independent part modeling. To address the above limitations, we propose EmoDiffGes, a diffusion-based framework grounded in embodied emotion theory, unifying dynamic emotion conditioning and part-aware synergistic modeling. Specifically, a Dynamic Emotion-Alignment Module (DEAM) is first applied to extract dynamic emotional cues and inject emotion guidance into the generation process. Then, a Progressive Synergistic Gesture Generator (PSGG) iteratively refines region-specific latent codes while maintaining full-body coordination, leveraging a Body Region Prior for part-specific encoding and Progressive Inter-Region Synergistic Flow for global motion coherence. Extensive experiments validate the effectiveness of our methods, showcasing the potential for generating expressive, coordinated, and emotionally grounded human gestures.

Original languageEnglish
Article numbere70261
JournalComputer Graphics Forum
Volume44
Issue number7
DOIs
StatePublished - Oct 2025

Keywords

  • Animation
  • CCS Concepts
  • Motion processing
  • • Computing methodologies → Computer graphics

Fingerprint

Dive into the research topics of 'EmoDiffGes: Emotion-Aware Co-Speech Holistic Gesture Generation with Progressive Synergistic Diffusion'. Together they form a unique fingerprint.

Cite this