TY - JOUR
T1 - OSH-Splat
T2 - optimizable semantic hyperplanes for enhanced 3D language feature Gaussian splatting
AU - Xu, Ruijie
AU - Ji, Yuzhou
AU - Tan, Xin
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
PY - 2025/10
Y1 - 2025/10
N2 - With the rapid technological advancement in the field of computer vision, building 3D language field models to support 3D open language queries has recently received increasing attention. This article introduces OSH-Splat, which constructs a 3D language field that allows for accurate and efficient open-ended lexical queries in 3D space. Firstly, we utilize the segment anything model to extract hierarchical semantic information at three levels: part, subpart, and whole. This not only addresses the target disambiguation problem but also produces pixel-aligned CLIP embeddings. Then, to reduce memory consumption, we employ a scene-specialized encoder-decoder pair. In the second stage of training, semantic features are learned as 3D Gaussian splatting features, which expand the 3D language field to support semantic queries. Furthermore, we propose optimizable semantic hyperplane (OSH), an innovative query strategy that enhances our 3D language feature Gaussians, which has moved away from traditional methods relying on fixed empirical thresholds and shows better accuracy and robustness in 3D semantic segmentation tasks. For each text query, OSH is iteratively optimized with the help of the reference expression segmentation model to achieve accurate target region localization. Extensive experimental results show that our approach outperforms state-of-the-art methods.
AB - With the rapid technological advancement in the field of computer vision, building 3D language field models to support 3D open language queries has recently received increasing attention. This article introduces OSH-Splat, which constructs a 3D language field that allows for accurate and efficient open-ended lexical queries in 3D space. Firstly, we utilize the segment anything model to extract hierarchical semantic information at three levels: part, subpart, and whole. This not only addresses the target disambiguation problem but also produces pixel-aligned CLIP embeddings. Then, to reduce memory consumption, we employ a scene-specialized encoder-decoder pair. In the second stage of training, semantic features are learned as 3D Gaussian splatting features, which expand the 3D language field to support semantic queries. Furthermore, we propose optimizable semantic hyperplane (OSH), an innovative query strategy that enhances our 3D language feature Gaussians, which has moved away from traditional methods relying on fixed empirical thresholds and shows better accuracy and robustness in 3D semantic segmentation tasks. For each text query, OSH is iteratively optimized with the help of the reference expression segmentation model to achieve accurate target region localization. Extensive experimental results show that our approach outperforms state-of-the-art methods.
KW - 3D Gaussian splatting
KW - 3D scene understanding
KW - Hyperplane
KW - Open vocabulary
UR - https://www.scopus.com/pages/publications/105009986094
U2 - 10.1007/s00371-025-04091-5
DO - 10.1007/s00371-025-04091-5
M3 - 文章
AN - SCOPUS:105009986094
SN - 0178-2789
VL - 41
SP - 11127
EP - 11137
JO - Visual Computer
JF - Visual Computer
IS - 13
ER -