Accelerating vision foundation model for efficient medical image segmentation

  • Xian Tao Wu
  • , Xiao Diao Chen
  • , Wen Wu*
  • , Weiyin Ma
  • , Haichuan Song
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Segment anything model (SAM) has recently demonstrated tremendous potential in the artificial intelligence-based medical image segmentation. However, its outstanding performance is always accompanied by surging computation costs due to the extremely long patch sequences and their quadratic complexity in vision transformer, which hamper the development of real-time engineering applications. Purpose: This work aims to accelerate SAM for medical image segmentation with better segmentation quality and less memory usage. Methods: Instead of upscaling training images to larger resolutions to match the pre-trained SAM's input size, which may introduce image distortions, we propose a convolutional neural network (CNN)-assisted tuning strategy. This approach enables SAM to process smaller inputs directly, significantly reducing the number of patches and memory consumption. Moreover, based on the observation that the less-informative patches in medical images tend to be well-classified earlier than others, revealing the massive computation redundancy in the SAM, a token pausing strategy is presented to identify these easy-to-predicted patches in the feed-forward process and then pause their computation. Finally, we combine the CNN-assisted tuning strategy with the token pausing strategy, meets various run-time requirements by simply adjusting the pause parameters, and avoids repeated training. Results: Extensive experiments on two medical image segmentation benchmarks, namely Synapse and ACDC, demonstrate that our model can run 12 (Formula presented.) faster than existing SAM-based solutions, and meanwhile achieves a better segmentation performance. Conclusions: This study identifies two practical limitations when applying SAM to medical imaging: resizing low-resolution inputs to match the pre-trained scale and uniformly processing all patches, which lead to redundant computation. Therefore, we present an efficient strategy that integrates adapter-based tuning with token pausing to enhance throughput while preserving segmentation performance. Relevant codes and models of this paper will be available at https://github.com/wuwen1994/FasterSAM.

Original languageEnglish
Article numbere70193
JournalMedical Physics
Volume53
Issue number1
DOIs
StatePublished - Jan 2026

Keywords

  • deep learning
  • medical image segmentation
  • segment anything model
  • throughput
  • token pausing

Fingerprint

Dive into the research topics of 'Accelerating vision foundation model for efficient medical image segmentation'. Together they form a unique fingerprint.

Cite this