Abstract
Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional triplane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability.
| Original language | English |
|---|---|
| Pages (from-to) | 4248-4256 |
| Number of pages | 9 |
| Journal | Proceedings of the AAAI Conference on Artificial Intelligence |
| Volume | 40 |
| Issue number | 6 |
| DOIs | |
| State | Published - 2026 |
| Event | 40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore Duration: 20 Jan 2026 → 27 Jan 2026 |
Fingerprint
Dive into the research topics of 'Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver