Gene-Induced Multimodal Pre-training for Image-Omic Classification

  • Ting Jin
  • , Xingran Xie
  • , Renjie Wan
  • , Qingli Li
  • , Yan Wang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information. After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings
EditorsHayit Greenspan, Hayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, Russell Taylor
PublisherSpringer Science and Business Media Deutschland GmbH
Pages508-517
Number of pages10
ISBN (Print)9783031439865
DOIs
StatePublished - 2023
Event26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023 - Vancouver, Canada
Duration: 8 Oct 202312 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14225 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023
Country/TerritoryCanada
CityVancouver
Period8/10/2312/10/23

Keywords

  • Multimodal learning
  • Whole slide image classification

Fingerprint

Dive into the research topics of 'Gene-Induced Multimodal Pre-training for Image-Omic Classification'. Together they form a unique fingerprint.

Cite this