Unveiling the power of multimodal large language models for radio astronomical image understanding and question answering

Fuyong Zhao, Yuyang Li, Zhenyu Liu, Panfeng Chen, Cunshi Wang, Jifeng Liu, Hui Li, Yanhao Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Although multimodal large language models (MLLMs) have shown remarkable achievements across various scientific domains, their applications in radio astronomy remain largely unexplored. In this paper, we investigate the potential of MLLMs for image understanding and visual question answering (VQA) in radio astronomy. This can facilitate the use of MLLMs as AI assistants in both research and education by discerning and describing complex astronomical information in human-readable languages. However, general-purpose MLLMs show inferior performance in radio astronomy because they typically lack specialized knowledge. To bridge this gap, we construct a new VQA dataset, RadioAstroVQA, from open data repositories. Specifically, we transform data samples from different repositories into VQA examples by extracting questions based on task descriptions and observation reports associated with images and then composing their answers using ground-truth labels and captions. Furthermore, by leveraging the RadioAstroVQA dataset, we fine-tune two MLLMs of different parameter scales to specifically enhance their capacities for radio astronomical image classification and VQA tasks. Finally, we conduct extensive experiments to show that the fine-tuned MLLMs are capable of handling multiple types of radio astronomical images and generating customized textual output tailored to specific task needs. They achieve accuracy comparable to or even better than that of existing deep learning models for classification tasks. They also demonstrate significantly better performance on VQA tasks compared to several state-of-the-art MLLMs in general domains. These results confirm the potential of MLLMs to serve as specialized AI assistants in the field of radio astronomy.

Original languageEnglish
Article number045005
JournalMachine Learning: Science and Technology
Volume6
Issue number4
DOIs
StatePublished - 30 Dec 2025

Keywords

  • image classification
  • multimodal large language models
  • radio astronomy
  • visual question answering

Fingerprint

Dive into the research topics of 'Unveiling the power of multimodal large language models for radio astronomical image understanding and question answering'. Together they form a unique fingerprint.

Cite this