ChatPaper.aiChatPaper

VerifiAgent:语言模型推理中的统一验证代理

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

April 1, 2025
作者: Jiuzhou Han, Wray Buntine, Ehsan Shareghi
cs.AI

摘要

大型语言模型展现出卓越的推理能力,但往往会产生不可靠或错误的回答。现有的验证方法通常局限于特定模型或领域,需要大量计算资源,且难以跨多种推理任务扩展。为解决这些局限,我们提出了VerifiAgent,一个统一的验证代理,它整合了两层验证机制:元验证,用于评估模型回答的完整性与一致性;以及基于工具的自适应验证,VerifiAgent能根据推理类型(如数学、逻辑或常识推理)自主选择合适的验证工具。这种自适应方法确保了不同验证场景下的效率与鲁棒性。实验结果表明,VerifiAgent在所有推理任务中均优于基线验证方法(如演绎验证器、逆向验证器)。此外,通过利用验证结果的反馈,它能进一步提升推理准确性。VerifiAgent还能有效应用于推理扩展,在数学推理领域,相较于现有的过程奖励模型,它以更少的生成样本和成本实现了更优的结果。代码已发布于https://github.com/Jiuzhouh/VerifiAgent。
English
Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent

Summary

AI-Generated Summary

PDF62April 3, 2025