Abstract
Designing techniques to generate plausible hand reaching and grasping motions for objects is a long-standing problem that finds many applications in computer graphics, robotics, and virtual reality. Despite recent advances that rely on only learning from real-world collecting of hand-grasping motions, approaches remain limited in the diversity of motions that can be generated. In this paper, we design a model for learning how to generate rich and diverse reaching and grasping motions while keeping plausible final grasps for an articulated human hand. Our design relies on a latent diffusion model conditioned by a latent representation of grasp affordance. For the grasp affordance, we train a conditional variational autoencoder (cVAE) model to learn the latent space for both the target object and desirable grasps. Then, we utilize a large-scale synthetic dataset to train a Transformer-based VAE to learn a motion latent representation as a latent prior. Finally, our LDM takes this latent prior as input, uses the grasp affordance as a condition, and generates motions for new objects without re-training from scratch. Compared to baseline techniques which use cVAEs to concatenate conditions such as the features of the target object directly in the network, we demonstrate that using LDMs in which we fuse the motions latent representation with the grasp affordance latent achieves more qualitative results, and also enables the generation of a greater variety of motions.
| Original language | English |
|---|---|
| Journal | Computer Graphics Forum |
| DOIs | |
| State | Accepted/In press - 2026 |
Keywords
- 3D hand motion
- grasping pose generating
- latent diffusion
Fingerprint
Dive into the research topics of 'Grasping Motion Generation Through Latent Diffusion Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver