TY - GEN
T1 - FrameProver
T2 - 21st International Conference on Intelligent Computing, ICIC 2025
AU - Shan, Haojia
AU - Xiong, Beibei
AU - Qi, Niuniu
AU - Yang, Zhengfeng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Formal theorem proving with large language models (LLMs) remains a significant challenge due to the inherent complexity of structured reasoning and the scarcity of formal proof data. While recent efforts have focused on addressing the limited availability of formal proof data through techniques such as data augmentation and automatic formalization, these approaches continue to face substantial challenges, leaving the problem of data scarcity unresolved. Therefore, to better leverage existing formal data, we propose FrameProver, a whole-proof generation model that first constructs an intermediate formal framework to guide the generation of complete proofs. To support this, we analyze proof patterns in Lean 4 and design formal framework templates, which are used to generate training data with the help of our tool, REPL4Framework. Based on the Lean-workbook-proofs dataset, we construct a balanced dataset of 21.3k examples, including 10.6k examples with formal frameworks. FrameProver is fine-tuned from DeepSeek-Prover-V1.5-Base on this dataset and achieves 40.2% accuracy (pass@128, from 37.3% without the framework) on the MiniF2F-Test benchmark.
AB - Formal theorem proving with large language models (LLMs) remains a significant challenge due to the inherent complexity of structured reasoning and the scarcity of formal proof data. While recent efforts have focused on addressing the limited availability of formal proof data through techniques such as data augmentation and automatic formalization, these approaches continue to face substantial challenges, leaving the problem of data scarcity unresolved. Therefore, to better leverage existing formal data, we propose FrameProver, a whole-proof generation model that first constructs an intermediate formal framework to guide the generation of complete proofs. To support this, we analyze proof patterns in Lean 4 and design formal framework templates, which are used to generate training data with the help of our tool, REPL4Framework. Based on the Lean-workbook-proofs dataset, we construct a balanced dataset of 21.3k examples, including 10.6k examples with formal frameworks. FrameProver is fine-tuned from DeepSeek-Prover-V1.5-Base on this dataset and achieves 40.2% accuracy (pass@128, from 37.3% without the framework) on the MiniF2F-Test benchmark.
KW - Formal Theorem Proving
KW - Large Language Model
KW - Mathematical Reasoning
UR - https://www.scopus.com/pages/publications/105012925690
U2 - 10.1007/978-981-96-9884-4_16
DO - 10.1007/978-981-96-9884-4_16
M3 - 会议稿件
AN - SCOPUS:105012925690
SN - 9789819698837
T3 - Lecture Notes in Computer Science
SP - 188
EP - 200
BT - Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
A2 - Huang, De-Shuang
A2 - Zhang, Chuanlei
A2 - Zhang, Qinhu
A2 - Pan, Yijie
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2025 through 29 July 2025
ER -