TY - JOUR
T1 - Bootstrapping OTS-Funcimg pre-training model (Botfip)
T2 - a comprehensive multimodal scientific computing framework and its application in symbolic regression task
AU - Chen, Tianhao
AU - Li, Zeyu
AU - Xu, Pengbo
AU - Zheng, Haibiao
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/10
Y1 - 2025/10
N2 - In the realm of scientific computing, many problem-solving approaches focus primarily on processes and outcomes. Even in AI applications within science, a notable absence of deep multimodal information mining is often observed, with a lack of frameworks analogous to those in the image-text domain. This paper introduces a novel scientific computing multimodal framework based on Function Images (Funcimg) and Operation Tree Skeleton Sequence (OTS), named Bootstrapping OTS-Funcimg Pre-training Model (Botfip), which is inspired by the BLIP model from the image-text field. Botfip employs image encoders such as ViT and sequence encoders like BERT, aligning these encoders during the pre-training phase by applying contrastive learning on a large-scale dataset of Funcimg-OTS pairs. This approach successfully facilitates the multimodal information mining of functions, serving as the foundation for completing corresponding downstream tasks such as symbolic regression (SR). Experiments in this paper demonstrate Botfip’s exceptional capability to mine multimodal symbolic and numerical information during the pre-training phase and highlight its performance in SR tasks, especially in tackling low-complexity SR problems. As a Multimodal framework, Botfip shows promising potential for future applications across a broader spectrum of scientific computing challenges.
AB - In the realm of scientific computing, many problem-solving approaches focus primarily on processes and outcomes. Even in AI applications within science, a notable absence of deep multimodal information mining is often observed, with a lack of frameworks analogous to those in the image-text domain. This paper introduces a novel scientific computing multimodal framework based on Function Images (Funcimg) and Operation Tree Skeleton Sequence (OTS), named Bootstrapping OTS-Funcimg Pre-training Model (Botfip), which is inspired by the BLIP model from the image-text field. Botfip employs image encoders such as ViT and sequence encoders like BERT, aligning these encoders during the pre-training phase by applying contrastive learning on a large-scale dataset of Funcimg-OTS pairs. This approach successfully facilitates the multimodal information mining of functions, serving as the foundation for completing corresponding downstream tasks such as symbolic regression (SR). Experiments in this paper demonstrate Botfip’s exceptional capability to mine multimodal symbolic and numerical information during the pre-training phase and highlight its performance in SR tasks, especially in tackling low-complexity SR problems. As a Multimodal framework, Botfip shows promising potential for future applications across a broader spectrum of scientific computing challenges.
KW - Artificial intelligence
KW - Multimodal learning
KW - Scientific computing
KW - Symbolic regression
UR - https://www.scopus.com/pages/publications/105013591585
U2 - 10.1007/s40747-025-02052-y
DO - 10.1007/s40747-025-02052-y
M3 - 文章
AN - SCOPUS:105013591585
SN - 2199-4536
VL - 11
JO - Complex and Intelligent Systems
JF - Complex and Intelligent Systems
IS - 10
M1 - 417
ER -