TY - JOUR
T1 - Unveiling the power of multimodal large language models for radio astronomical image understanding and question answering
AU - Zhao, Fuyong
AU - Li, Yuyang
AU - Liu, Zhenyu
AU - Chen, Panfeng
AU - Wang, Cunshi
AU - Liu, Jifeng
AU - Li, Hui
AU - Wang, Yanhao
N1 - Publisher Copyright:
© 2025 The Author(s). Published by IOP Publishing Ltd.
PY - 2025/12/30
Y1 - 2025/12/30
N2 - Although multimodal large language models (MLLMs) have shown remarkable achievements across various scientific domains, their applications in radio astronomy remain largely unexplored. In this paper, we investigate the potential of MLLMs for image understanding and visual question answering (VQA) in radio astronomy. This can facilitate the use of MLLMs as AI assistants in both research and education by discerning and describing complex astronomical information in human-readable languages. However, general-purpose MLLMs show inferior performance in radio astronomy because they typically lack specialized knowledge. To bridge this gap, we construct a new VQA dataset, RadioAstroVQA, from open data repositories. Specifically, we transform data samples from different repositories into VQA examples by extracting questions based on task descriptions and observation reports associated with images and then composing their answers using ground-truth labels and captions. Furthermore, by leveraging the RadioAstroVQA dataset, we fine-tune two MLLMs of different parameter scales to specifically enhance their capacities for radio astronomical image classification and VQA tasks. Finally, we conduct extensive experiments to show that the fine-tuned MLLMs are capable of handling multiple types of radio astronomical images and generating customized textual output tailored to specific task needs. They achieve accuracy comparable to or even better than that of existing deep learning models for classification tasks. They also demonstrate significantly better performance on VQA tasks compared to several state-of-the-art MLLMs in general domains. These results confirm the potential of MLLMs to serve as specialized AI assistants in the field of radio astronomy.
AB - Although multimodal large language models (MLLMs) have shown remarkable achievements across various scientific domains, their applications in radio astronomy remain largely unexplored. In this paper, we investigate the potential of MLLMs for image understanding and visual question answering (VQA) in radio astronomy. This can facilitate the use of MLLMs as AI assistants in both research and education by discerning and describing complex astronomical information in human-readable languages. However, general-purpose MLLMs show inferior performance in radio astronomy because they typically lack specialized knowledge. To bridge this gap, we construct a new VQA dataset, RadioAstroVQA, from open data repositories. Specifically, we transform data samples from different repositories into VQA examples by extracting questions based on task descriptions and observation reports associated with images and then composing their answers using ground-truth labels and captions. Furthermore, by leveraging the RadioAstroVQA dataset, we fine-tune two MLLMs of different parameter scales to specifically enhance their capacities for radio astronomical image classification and VQA tasks. Finally, we conduct extensive experiments to show that the fine-tuned MLLMs are capable of handling multiple types of radio astronomical images and generating customized textual output tailored to specific task needs. They achieve accuracy comparable to or even better than that of existing deep learning models for classification tasks. They also demonstrate significantly better performance on VQA tasks compared to several state-of-the-art MLLMs in general domains. These results confirm the potential of MLLMs to serve as specialized AI assistants in the field of radio astronomy.
KW - image classification
KW - multimodal large language models
KW - radio astronomy
KW - visual question answering
UR - https://www.scopus.com/pages/publications/105018860713
U2 - 10.1088/2632-2153/ae0c56
DO - 10.1088/2632-2153/ae0c56
M3 - 文章
AN - SCOPUS:105018860713
SN - 2632-2153
VL - 6
JO - Machine Learning: Science and Technology
JF - Machine Learning: Science and Technology
IS - 4
M1 - 045005
ER -