TY - GEN
T1 - Feature Pyramid Vision Transformer for MedMNIST Classification Decathlon
AU - Liu, Jinwei
AU - Li, Yan
AU - Cao, Guitao
AU - Liu, Yong
AU - Cao, Wenming
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - MedMNIST is a medical dataset proposed to block the need for medical knowledge, but there is currently no model that can generalize well on all its sub-datasets. Owing to the inadequacy of long-range relation modeling, models based on convolutional neural networks (CNNs) cannot fully learn the information of images. Besides, relying only on high-level features limits the generalization effect as well. All of these remain challenges for MedMNIST Classification Decathlon. In this paper, we proposed Feature Pyramid Vision Transformer (FPViT), a strong alternative for MedMNIST Classification Decathlon. Our FPViT exhibits enhanced feature learning and modeling capabilities, which merits both residual network (ResNet) and Vision Transformer (ViT). Transformers in our model take the features extracted by ResNet as sequences to capture global contexts which compensate for the lack of locality of convolution operations. Moreover, the feature pyramid designed in our model effectively utilizes the multi-scale feature maps from basic layers of ResNet. These multi-scale features from low-level to high level enable our model to have better adaptability. And, the final prediction is based on the multi-scale ViT and the original ResNet heads. Through experiments, our FPViT can achieve superior classification and generalization on MedMNIST than state-of-the-art methods.
AB - MedMNIST is a medical dataset proposed to block the need for medical knowledge, but there is currently no model that can generalize well on all its sub-datasets. Owing to the inadequacy of long-range relation modeling, models based on convolutional neural networks (CNNs) cannot fully learn the information of images. Besides, relying only on high-level features limits the generalization effect as well. All of these remain challenges for MedMNIST Classification Decathlon. In this paper, we proposed Feature Pyramid Vision Transformer (FPViT), a strong alternative for MedMNIST Classification Decathlon. Our FPViT exhibits enhanced feature learning and modeling capabilities, which merits both residual network (ResNet) and Vision Transformer (ViT). Transformers in our model take the features extracted by ResNet as sequences to capture global contexts which compensate for the lack of locality of convolution operations. Moreover, the feature pyramid designed in our model effectively utilizes the multi-scale feature maps from basic layers of ResNet. These multi-scale features from low-level to high level enable our model to have better adaptability. And, the final prediction is based on the multi-scale ViT and the original ResNet heads. Through experiments, our FPViT can achieve superior classification and generalization on MedMNIST than state-of-the-art methods.
KW - MedMNIST
KW - Medical image analysis
KW - Multi-scale
KW - Vision Transformer
UR - https://www.scopus.com/pages/publications/85140788175
U2 - 10.1109/IJCNN55064.2022.9892282
DO - 10.1109/IJCNN55064.2022.9892282
M3 - 会议稿件
AN - SCOPUS:85140788175
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
Y2 - 18 July 2022 through 23 July 2022
ER -