TY - JOUR
T1 - Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy
AU - Gong, Jingyu
AU - Tong, Kunkun
AU - Chen, Zhuoran
AU - Yuan, Chuanhan
AU - Chen, Mingang
AU - Zhang, Zhizhong
AU - Tan, Xin
AU - Xie, Yuan
N1 - Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2026
Y1 - 2026
N2 - Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional triplane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability.
AB - Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional triplane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability.
UR - https://www.scopus.com/pages/publications/105034565329
U2 - 10.1609/aaai.v40i6.42421
DO - 10.1609/aaai.v40i6.42421
M3 - 会议文章
AN - SCOPUS:105034565329
SN - 2159-5399
VL - 40
SP - 4248
EP - 4256
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 6
T2 - 40th AAAI Conference on Artificial Intelligence, AAAI 2026
Y2 - 20 January 2026 through 27 January 2026
ER -