Abstract
Co-speech gesture generation, driven by emotional expression and synergistic bodily movements, is essential for applications such as virtual avatars and human-robot interaction. Existing co-speech gesture generation methods face two fundamental limitations: (1) producing inexpressive gestures due to ignoring the temporal evolution of emotion; (2) generating incoherent and unnatural motions as a result of either holistic body oversimplification or independent part modeling. To address the above limitations, we propose EmoDiffGes, a diffusion-based framework grounded in embodied emotion theory, unifying dynamic emotion conditioning and part-aware synergistic modeling. Specifically, a Dynamic Emotion-Alignment Module (DEAM) is first applied to extract dynamic emotional cues and inject emotion guidance into the generation process. Then, a Progressive Synergistic Gesture Generator (PSGG) iteratively refines region-specific latent codes while maintaining full-body coordination, leveraging a Body Region Prior for part-specific encoding and Progressive Inter-Region Synergistic Flow for global motion coherence. Extensive experiments validate the effectiveness of our methods, showcasing the potential for generating expressive, coordinated, and emotionally grounded human gestures.
| Original language | English |
|---|---|
| Article number | e70261 |
| Journal | Computer Graphics Forum |
| Volume | 44 |
| Issue number | 7 |
| DOIs | |
| State | Published - Oct 2025 |
Keywords
- Animation
- CCS Concepts
- Motion processing
- • Computing methodologies → Computer graphics