摘要
Designing techniques to generate plausible hand reaching and grasping motions for objects is a long-standing problem that finds many applications in computer graphics, robotics, and virtual reality. Despite recent advances that rely on only learning from real-world collecting of hand-grasping motions, approaches remain limited in the diversity of motions that can be generated. In this paper, we design a model for learning how to generate rich and diverse reaching and grasping motions while keeping plausible final grasps for an articulated human hand. Our design relies on a latent diffusion model conditioned by a latent representation of grasp affordance. For the grasp affordance, we train a conditional variational autoencoder (cVAE) model to learn the latent space for both the target object and desirable grasps. Then, we utilize a large-scale synthetic dataset to train a Transformer-based VAE to learn a motion latent representation as a latent prior. Finally, our LDM takes this latent prior as input, uses the grasp affordance as a condition, and generates motions for new objects without re-training from scratch. Compared to baseline techniques which use cVAEs to concatenate conditions such as the features of the target object directly in the network, we demonstrate that using LDMs in which we fuse the motions latent representation with the grasp affordance latent achieves more qualitative results, and also enables the generation of a greater variety of motions.
| 源语言 | 英语 |
|---|---|
| 期刊 | Computer Graphics Forum |
| DOI | |
| 出版状态 | 已接受/待刊 - 2026 |
指纹
探究 'Grasping Motion Generation Through Latent Diffusion Models' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver