Feature Pyramid Vision Transformer for MedMNIST Classification Decathlon

Jinwei Liu, Yan Li, Guitao Cao, Yong Liu, Wenming Cao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

MedMNIST is a medical dataset proposed to block the need for medical knowledge, but there is currently no model that can generalize well on all its sub-datasets. Owing to the inadequacy of long-range relation modeling, models based on convolutional neural networks (CNNs) cannot fully learn the information of images. Besides, relying only on high-level features limits the generalization effect as well. All of these remain challenges for MedMNIST Classification Decathlon. In this paper, we proposed Feature Pyramid Vision Transformer (FPViT), a strong alternative for MedMNIST Classification Decathlon. Our FPViT exhibits enhanced feature learning and modeling capabilities, which merits both residual network (ResNet) and Vision Transformer (ViT). Transformers in our model take the features extracted by ResNet as sequences to capture global contexts which compensate for the lack of locality of convolution operations. Moreover, the feature pyramid designed in our model effectively utilizes the multi-scale feature maps from basic layers of ResNet. These multi-scale features from low-level to high level enable our model to have better adaptability. And, the final prediction is based on the multi-scale ViT and the original ResNet heads. Through experiments, our FPViT can achieve superior classification and generalization on MedMNIST than state-of-the-art methods.

Original languageEnglish
Title of host publication2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728186719
DOIs
StatePublished - 2022
Event2022 International Joint Conference on Neural Networks, IJCNN 2022 - Padua, Italy
Duration: 18 Jul 202223 Jul 2022

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2022 International Joint Conference on Neural Networks, IJCNN 2022
Country/TerritoryItaly
CityPadua
Period18/07/2223/07/22

Keywords

  • MedMNIST
  • Medical image analysis
  • Multi-scale
  • Vision Transformer

Fingerprint

Dive into the research topics of 'Feature Pyramid Vision Transformer for MedMNIST Classification Decathlon'. Together they form a unique fingerprint.

Cite this