ITJP: Image and Text Joint Prompts for Few-Shot Whole Slide Image Classification

Ziwei Zhu, Xinzhu Zhang, Zhikang Zhao, Jing Zhao*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multiple instance learning has achieved remarkable results in whole slide image (WSI) classification. However, constrained by patient privacy and the scarcity of cancer data, the insufficient quantity of data poses a challenge in training models with the extensive parameters required for large-pixel WSIs, giving rise to the requirement of few-shot WSI classification. Recently, pre-trained vision-language models (preVLM) have demonstrated great superiority in few-shot WSI classification tasks due to their good transfer-learning and few-shot capabilities. Therefore, we propose an Image and Text Joint Prompts for few-shot whole slide image classification method, ITJP, to construct a few-shot WSI classification model under the multiple instance learning framework. ITJP utilizes the pre-VLM to extract instance features under the unsupervised setting, constructs image prompts to guide the aggregation of instance features into bag features, and subsequently guides the classification of bag features by text prompts. Specifically, we propose an image prototype guided aggregation strategy, where image prototype is obtained by clustering patches extracted from representative WSIs. The image prototype guides the aggregation of instance features and is directly compared with the image patches to achieve more accurate similarity, thereby enhancing the focus on classification-relevant features. Furthermore, low-rank linear transformation is designed to obtain variable image prototype, enhancing adaptability in specific WSI classification task and enabling effective adaptation to the few-shot scenario. We conduct extensive experiments on two WSI datasets to demonstrate the significant performance of the ITJP for few-shot WSI classification.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationJourney to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798331594954
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Keywords

  • Few-shot Learning
  • Vision-Language Model
  • Whole Slide Image Classification

Fingerprint

Dive into the research topics of 'ITJP: Image and Text Joint Prompts for Few-Shot Whole Slide Image Classification'. Together they form a unique fingerprint.

Cite this