跳到主要导航 跳到搜索 跳到主要内容

Leveraging Panoptic Prior for 3D Zero-Shot Semantic Understanding Within Language Embedded Radiance Fields

  • Yuzhou Ji
  • , Xin Tan*
  • , He Zhu
  • , Wuyi Liu
  • , Jiachen Xu
  • , Yuan Xie
  • , Lizhuang Ma
  • *此作品的通讯作者
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Language Embedded Radiance Fields (LERF) achieves promising results in real-time dense relevancy maps within NeRF 3D scenes. Although LERF shows impressive zero-shot ability in many long-tail open-vocabulary queries, the quality of relevancy maps could degrade in certain camera angles especially novel views and may even fail to localize. In this work we propose a method to bring in prior knowledge as the guidance of building a multi-scale CLIP (Contrastive Language-Image Pretraining) feature pyramid, achieving better localization ability and 3D consistency without any harm to original zero-shot capability. Specifically, we use panoptic segmentation to preprocess training images and reconstruct multi-scale image pyramid with segmented tiles. Unlike some other works, we only use the continuous semantic meaning of image tiles for accurate CLIP features, instead of labels or IDs which are inconsistent across views. And the tiles are partially overridden based on location and scale, preserving also a large amount of non-prior knowledge. And in order to effectively compare the results with LERF, we designed a metric based on pixel relevancy, which could further support future research based on LERF representation. Additionally, we explore the possibility of grounding dense 3D consistent segmentation information within LERF during experiments, providing an inspiring train of thought about distilling 2D knowledge into 3D scenes for 3D manipulation.

源语言英语
主期刊名Computational Visual Media - 12th International Conference, CVM 2024, Proceedings
编辑Fang-Lue Zhang, Andrei Sharf
出版商Springer Science and Business Media Deutschland GmbH
42-58
页数17
ISBN(印刷版)9789819720941
DOI
出版状态已出版 - 2024
活动12th International Conference on Computational Visual Media, CVM 2024 - Wellington, 新西兰
期限: 10 4月 202412 4月 2024

出版系列

姓名Lecture Notes in Computer Science
14592 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议12th International Conference on Computational Visual Media, CVM 2024
国家/地区新西兰
Wellington
时期10/04/2412/04/24

指纹

探究 'Leveraging Panoptic Prior for 3D Zero-Shot Semantic Understanding Within Language Embedded Radiance Fields' 的科研主题。它们共同构成独一无二的指纹。

引用此