Open-Scene Understanding-oriented 3D Scene Graph Generation

Yuansu Hao, Fei Yu*, Yanhao Wang*, Yuehua Li, Quan Deng, Yuan Yu, Chen Huang, Nan Che

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Understanding complex 3D environments is essential for many computer vision and robotic applications, especially in highly dynamic open-scene scenarios. The 3D scene graph plays an important role in the comprehension of 3D environments. However, most existing methods for 3D scene graph generation depend on pre-specified object and relationship classes (i.e., closed vocabulary) and labeled data for training, which restricts their effectiveness in the open-scene setting. To address this issue, we propose a novel Open-Scene Understanding-oriented 3D Scene Graph (OSU-3DSG) framework that can operate without labeled training data. The OSU-3DSG framework effectively extracts visual features from RGB-D image sequences and fuses them with camera pose estimates to create accurate 3D object maps. Then, by leveraging a pre-trained Vision Language Model (VLM), it generates relational triplets and constructs 3D scene graphs in a zero-shot manner. In particular, it excels at adaptively recognizing and interpreting object relationships, making it suitable for open-world applications. Finally, we perform extensive experiments on two open-world 3D datasets, namely 3DSSG and Replica, to evaluate the effectiveness and adaptability of the OSU-3DSG framework, demonstrating its potential to pave the way for the advancement of open-scene understanding. Our code and data are published at https://github.com/YuansuHao/OSU-3DSG.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationJourney to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798331594954
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Keywords

  • 3D scene graph generation
  • open-scene understanding
  • vision language model
  • zero-shot learning

Fingerprint

Dive into the research topics of 'Open-Scene Understanding-oriented 3D Scene Graph Generation'. Together they form a unique fingerprint.

Cite this