ProgCo：程序帮助大型语言模型自我校正

摘要

自我校正旨在使大型语言模型（LLMs）能够在没有外部反馈的情况下自我验证和自我完善其初始响应。然而，LLMs经常无法有效地自我验证和生成正确的反馈，进而导致自我校正的失败，特别是在复杂推理任务中。在本文中，我们提出了程序驱动的自我校正（ProgCo）。首先，程序驱动的验证（ProgVe）通过自动生成、自执行的验证伪程序实现复杂的验证逻辑和广泛的验证。然后，程序驱动的完善（ProgRe）从ProgVe接收反馈，在响应和验证程序上进行双重反思和完善，以减轻在复杂推理任务中错误反馈的误导。对三个指令遵循和数学基准的实验表明，ProgCo实现了有效的自我校正，并且在与真实程序工具结合时可以进一步提高性能。

English

Self-Correction aims to enable large language models (LLMs) to self-verify and self-refine their initial responses without external feedback. However, LLMs often fail to effectively self-verify and generate correct feedback, further misleading refinement and leading to the failure of self-correction, especially in complex reasoning tasks. In this paper, we propose Program-driven Self-Correction (ProgCo). First, program-driven verification (ProgVe) achieves complex verification logic and extensive validation through self-generated, self-executing verification pseudo-programs. Then, program-driven refinement (ProgRe) receives feedback from ProgVe, conducts dual reflection and refinement on both responses and verification programs to mitigate misleading of incorrect feedback in complex reasoning tasks. Experiments on three instruction-following and mathematical benchmarks indicate that ProgCo achieves effective self-correction, and can be further enhance performance when combined with real program tools.

ProgCo：程序帮助大型语言模型自我校正

ProgCo: Program Helps Self-Correction of Large Language Models

摘要

Summary

Support