TY - GEN
T1 - Logic-Regularized Verifier Elicits Reasoning from LLMs
AU - Wang, Xinyu
AU - Sun, Changzhi
AU - Cheng, Lian
AU - Wu, Yuanbin
AU - Zhang, Dell
AU - Wang, Xiaoling
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typical verifiers require resource-intensive supervised dataset construction, which is costly and faces limitations in data diversity. In this paper, we propose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats the verifier as a binary latent variable, utilizing internal activations and enforcing three logical constraints on multiple reasoning paths: negation consistency, intra-group consistency, and inter-group consistency (grouped by the final answer). By incorporating logical rules as priors, LOVER can leverage unlabeled examples and is directly compatible with any off-the-shelf LLMs. Experiments on 10 datasets demonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier (reaching its 95% level on average). The source code is publicly available at https://github.com/wangxinyufighting/llm-lover.
AB - Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typical verifiers require resource-intensive supervised dataset construction, which is costly and faces limitations in data diversity. In this paper, we propose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats the verifier as a binary latent variable, utilizing internal activations and enforcing three logical constraints on multiple reasoning paths: negation consistency, intra-group consistency, and inter-group consistency (grouped by the final answer). By incorporating logical rules as priors, LOVER can leverage unlabeled examples and is directly compatible with any off-the-shelf LLMs. Experiments on 10 datasets demonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier (reaching its 95% level on average). The source code is publicly available at https://github.com/wangxinyufighting/llm-lover.
UR - https://www.scopus.com/pages/publications/105021027634
M3 - 会议稿件
AN - SCOPUS:105021027634
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 32617
EP - 32630
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -