TY - GEN
T1 - Learning Torso Prior for Co-Speech Gesture Generation with Better Hand Shape
AU - Wang, Hexiang
AU - Liu, Fengqi
AU - Yi, Ran
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Co-speech gesture generation is the task of synthesizing gesture sequences synchronized with an input audio signal. Previous methods try to estimate upper body gesture as a whole, ignoring the different mapping relations between audio and different body parts, which leads to poor overall results especially bad hand shapes. In this paper, we propose a novel three-branch co-speech gesture generation framework to obtain better results. In particular, we propose a Torso2Hand Prior Learning module (T2HPL) to leverage torso information as an extra prior to enhance hand pose prediction, and carefully design a hand shape discriminator to improve the authenticity of generated hand shape. In addition, an arm orientation loss is designed to encourage the network to generate torso part with better semantic expressiveness. Experiments on dataset of four different speakers demonstrate the superiority of our method over the state-of-the-art approaches.
AB - Co-speech gesture generation is the task of synthesizing gesture sequences synchronized with an input audio signal. Previous methods try to estimate upper body gesture as a whole, ignoring the different mapping relations between audio and different body parts, which leads to poor overall results especially bad hand shapes. In this paper, we propose a novel three-branch co-speech gesture generation framework to obtain better results. In particular, we propose a Torso2Hand Prior Learning module (T2HPL) to leverage torso information as an extra prior to enhance hand pose prediction, and carefully design a hand shape discriminator to improve the authenticity of generated hand shape. In addition, an arm orientation loss is designed to encourage the network to generate torso part with better semantic expressiveness. Experiments on dataset of four different speakers demonstrate the superiority of our method over the state-of-the-art approaches.
KW - adversarial learning
KW - co-speech gesture generation
KW - cross-modal learning
UR - https://www.scopus.com/pages/publications/85180740422
U2 - 10.1109/ICIP49359.2023.10222259
DO - 10.1109/ICIP49359.2023.10222259
M3 - 会议稿件
AN - SCOPUS:85180740422
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1
EP - 5
BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Image Processing, ICIP 2023
Y2 - 8 October 2023 through 11 October 2023
ER -