TY - GEN
T1 - ITJP
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
AU - Zhu, Ziwei
AU - Zhang, Xinzhu
AU - Zhao, Zhikang
AU - Zhao, Jing
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Multiple instance learning has achieved remarkable results in whole slide image (WSI) classification. However, constrained by patient privacy and the scarcity of cancer data, the insufficient quantity of data poses a challenge in training models with the extensive parameters required for large-pixel WSIs, giving rise to the requirement of few-shot WSI classification. Recently, pre-trained vision-language models (preVLM) have demonstrated great superiority in few-shot WSI classification tasks due to their good transfer-learning and few-shot capabilities. Therefore, we propose an Image and Text Joint Prompts for few-shot whole slide image classification method, ITJP, to construct a few-shot WSI classification model under the multiple instance learning framework. ITJP utilizes the pre-VLM to extract instance features under the unsupervised setting, constructs image prompts to guide the aggregation of instance features into bag features, and subsequently guides the classification of bag features by text prompts. Specifically, we propose an image prototype guided aggregation strategy, where image prototype is obtained by clustering patches extracted from representative WSIs. The image prototype guides the aggregation of instance features and is directly compared with the image patches to achieve more accurate similarity, thereby enhancing the focus on classification-relevant features. Furthermore, low-rank linear transformation is designed to obtain variable image prototype, enhancing adaptability in specific WSI classification task and enabling effective adaptation to the few-shot scenario. We conduct extensive experiments on two WSI datasets to demonstrate the significant performance of the ITJP for few-shot WSI classification.
AB - Multiple instance learning has achieved remarkable results in whole slide image (WSI) classification. However, constrained by patient privacy and the scarcity of cancer data, the insufficient quantity of data poses a challenge in training models with the extensive parameters required for large-pixel WSIs, giving rise to the requirement of few-shot WSI classification. Recently, pre-trained vision-language models (preVLM) have demonstrated great superiority in few-shot WSI classification tasks due to their good transfer-learning and few-shot capabilities. Therefore, we propose an Image and Text Joint Prompts for few-shot whole slide image classification method, ITJP, to construct a few-shot WSI classification model under the multiple instance learning framework. ITJP utilizes the pre-VLM to extract instance features under the unsupervised setting, constructs image prompts to guide the aggregation of instance features into bag features, and subsequently guides the classification of bag features by text prompts. Specifically, we propose an image prototype guided aggregation strategy, where image prototype is obtained by clustering patches extracted from representative WSIs. The image prototype guides the aggregation of instance features and is directly compared with the image patches to achieve more accurate similarity, thereby enhancing the focus on classification-relevant features. Furthermore, low-rank linear transformation is designed to obtain variable image prototype, enhancing adaptability in specific WSI classification task and enabling effective adaptation to the few-shot scenario. We conduct extensive experiments on two WSI datasets to demonstrate the significant performance of the ITJP for few-shot WSI classification.
KW - Few-shot Learning
KW - Vision-Language Model
KW - Whole Slide Image Classification
UR - https://www.scopus.com/pages/publications/105022642325
U2 - 10.1109/ICME59968.2025.11209047
DO - 10.1109/ICME59968.2025.11209047
M3 - 会议稿件
AN - SCOPUS:105022642325
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
Y2 - 30 June 2025 through 4 July 2025
ER -