ProgCo:程式協助大型語言模型自我修正

ProgCo: Program Helps Self-Correction of Large Language Models

January 2, 2025
作者: Xiaoshuai Song, Yanan Wu, Weixun Wang, Jiaheng Liu, Wenbo Su, Bo Zheng
cs.AI

摘要

自我校正旨在使大型語言模型(LLMs)能夠在沒有外部反饋的情況下自我驗證和自我完善其初始回應。然而,LLMs常常無法有效地自我驗證並生成正確的反饋,進一步誤導完善並導致自我校正失敗,尤其是在複雜的推理任務中。在本文中,我們提出了以程式驅動的自我校正(ProgCo)。首先,程式驅動驗證(ProgVe)通過自生成、自執行的驗證虛擬程式實現複雜的驗證邏輯和廣泛的驗證。然後,程式驅動完善(ProgRe)從ProgVe獲得反饋,對回應和驗證程式進行雙重反思和完善,以減輕在複雜推理任務中錯誤反饋的誤導。對三個指令遵循和數學基準進行的實驗表明,ProgCo實現了有效的自我校正,並在與真實程式工具結合時進一步提高性能。
English
Self-Correction aims to enable large language models (LLMs) to self-verify and self-refine their initial responses without external feedback. However, LLMs often fail to effectively self-verify and generate correct feedback, further misleading refinement and leading to the failure of self-correction, especially in complex reasoning tasks. In this paper, we propose Program-driven Self-Correction (ProgCo). First, program-driven verification (ProgVe) achieves complex verification logic and extensive validation through self-generated, self-executing verification pseudo-programs. Then, program-driven refinement (ProgRe) receives feedback from ProgVe, conducts dual reflection and refinement on both responses and verification programs to mitigate misleading of incorrect feedback in complex reasoning tasks. Experiments on three instruction-following and mathematical benchmarks indicate that ProgCo achieves effective self-correction, and can be further enhance performance when combined with real program tools.

Summary

AI-Generated Summary

PDF252January 3, 2025