RadLAS: A Foundation Model for Interpretable Radiography Image Analysis with Lesion-Aware Self-Supervised Pre-training

  • Yihang Liu
  • , Ying Wen
  • , Longzhen Yang*
  • , Lianghua He
  • , Heng Tao Shen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Medical Foundation Models (MFMs) are revolutionizing radiography image analysis with scalable and generalized diagnostic capabilities. However, their effectiveness in real-world clinical practice is limited due to insufficient interpretability. To address this limitation, we propose RadLAS, a novel MFM for interpretable Radiographic image analysis by introducing Lesion-Aware Self-supervised pre-training. Unlike conventional MFMs that rely on post-hoc explanations, RadLAS innovates by directly emulating human diagnostic reasoning to first grounding lesion evidence and then making decisions accordingly. Specifically, RadLAS introduces two self-supervised tasks: (I) Lesion-grounded Reconstruction, which learns structured anatomical representations by restoring lesion-aware image patches into their healthy counterparts, thereby facilitating pixel-level grounding of lesion evidence via input-normal contrast. (II) Lesion-discrimination Contrastive Learning, which enhances lesion-aware pattern in representations by explicitly decoupling grounded lesion evidence as clinical cues and aligning them with global semantics, thereby enabling direct lesion-oriented diagnosis while preserving global context. RadLAS demonstrates excellent performance across diverse downstream radiographic datasets, offering verifiable explanations by deriving specific diagnoses (Task II) based on grounded lesion evidence (Task I), while preserving generalized representations essential for high diagnostic accuracy. Extensive experiments demonstrate that RadLAS (i) achieves superior interpretability with highly correlated lesion prediction and localization, surpassing 11 interpretable medical models; (ii) delivers scalable representation learning, outperforming 14 SOTA supervised and self-supervised MFMs.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages10847-10856
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • interpretable radiography image analysis
  • medical foundation model
  • self-supervised representation learning

Fingerprint

Dive into the research topics of 'RadLAS: A Foundation Model for Interpretable Radiography Image Analysis with Lesion-Aware Self-Supervised Pre-training'. Together they form a unique fingerprint.

Cite this