TY - JOUR
T1 - Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation
AU - Li, Jiahao
AU - Lu, Yang
AU - Xie, Yuan
AU - Qu, Yanyun
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without corresponding labels. Existing Vision-Language Model (VLM)based methods leverage VLM's rich knowledge to enhance additional explicit segmentation-specific networks, yielding competitive results, but at the cost of extensive training cost. To reduce the cost, we attempt to enable VLM to directly produce the segmentation results without any segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual classification. Therefore, we propose the Relationship Prompt Module (RPM), which generates the relationship prompt that directs VLM to extract pixel-level semantic embeddings suitable for OVSS. Moreover, RPM integrates with VLM to construct the Relationship Prompt Network (RPN), achieving OVSS without any segmentation-specific networks. RPN attains state-of-the-art performance with merely about 3M trainable parameters (2% of total parameters).
AB - Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without corresponding labels. Existing Vision-Language Model (VLM)based methods leverage VLM's rich knowledge to enhance additional explicit segmentation-specific networks, yielding competitive results, but at the cost of extensive training cost. To reduce the cost, we attempt to enable VLM to directly produce the segmentation results without any segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual classification. Therefore, we propose the Relationship Prompt Module (RPM), which generates the relationship prompt that directs VLM to extract pixel-level semantic embeddings suitable for OVSS. Moreover, RPM integrates with VLM to construct the Relationship Prompt Network (RPN), achieving OVSS without any segmentation-specific networks. RPN attains state-of-the-art performance with merely about 3M trainable parameters (2% of total parameters).
UR - https://www.scopus.com/pages/publications/105000473018
M3 - 会议文章
AN - SCOPUS:105000473018
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
Y2 - 9 December 2024 through 15 December 2024
ER -