LLM-SYM: Integrating Symbolic Methods and Large Language Models for Automated Theorem Proving

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Interactive theorem provers (ITPs) are crucial tools in formal verification, enabling researchers to ensure software reliability and safety in critical systems. While effective, the process of writing proofs using ITPs is costly, involving repetitive tasks. Techniques for automating proofs using symbolic methods and machine learning have been proposed, but both have limitations: symbolic methods struggle with higher-order logic, while machine learning requires extensive training data. Although large language models (LLMs) are increasingly used in software engineering for tasks such as code generation, their effectiveness in theorem proving is hindered by a lack of sufficient training data and the iterative nature of proof generation. To overcome these challenges, we introduce LLM-SYM, an end-to-end framework that integrates symbolic methods with LLMs, enhancing LLMs’ theorem-proving capabilities through fine-tuning. Our approach extracts proof information from human-constructed proofs and employs symbolic methods to rewrite low-level proofs. We enhance data with tactics history and use curriculum sorting to optimize learning. We developed an interactive proof search agent to explore and verify tactics. Testing LLM-SYM on CompCert project reveals improvements over existing methods, and ablation experiments confirm the effectiveness of components within our approach.

Original languageEnglish
Title of host publicationFormal Methods and Software Engineering - 26th International Conference on Formal Engineering Methods, ICFEM 2025, Proceedings
EditorsÉtienne André, Jingyi Wang, Naijun Zhan
PublisherSpringer Science and Business Media Deutschland GmbH
Pages42-59
Number of pages18
ISBN (Print)9789819542123
DOIs
StatePublished - 2026
Event26th International Conference on Formal Engineering Methods, ICFEM 2025 - Hangzhou, China
Duration: 10 Nov 202513 Nov 2025

Publication series

NameLecture Notes in Computer Science
Volume16229 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Formal Engineering Methods, ICFEM 2025
Country/TerritoryChina
CityHangzhou
Period10/11/2513/11/25

Keywords

  • Coq
  • Large Language Models(LLM)
  • Theorem Proving

Fingerprint

Dive into the research topics of 'LLM-SYM: Integrating Symbolic Methods and Large Language Models for Automated Theorem Proving'. Together they form a unique fingerprint.

Cite this