透過自我演化的評論者實現可擴展的監督

摘要

儘管大型語言模型（LLMs）表現出色，但在可擴展監督方面面臨一個關鍵挑戰：為難以進行人類評估或LLMs表現優於人類的任務提供有效反饋。儘管人們對使用LLMs進行評論越來越感興趣，但目前的方法仍依賴於人類標註或更強大的模型，這使得在沒有外部監督的情況下增強評論能力的問題尚未解決。我們引入了SCRIT（Self-evolving CRITic），這是一個能夠實現真正自我進化評論能力的框架。從技術上講，SCRIT通過在合成數據上進行訓練來自我改進，這些數據是由基於對比的自評者生成的，該自評者使用參考解決方案進行逐步評論，並通過校正結果確保評論質量的自我驗證機制。使用Qwen2.5-72B-Instruct，其中一個最強大的LLMs，SCRIT在評論校正和錯誤識別基準上實現了高達10.3％的改進。我們的分析顯示，SCRIT的性能隨著數據和模型大小的增加而正向擴展，優於替代方法，並且在很大程度上受益於其自我驗證組件。

English

Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans. While there is growing interest in using LLMs for critique, current approaches still rely on human annotations or more powerful models, leaving the issue of enhancing critique capabilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that enables genuine self-evolution of critique abilities. Technically, SCRIT self-improves by training on synthetic data, generated by a contrastive-based self-critic that uses reference solutions for step-by-step critique, and a self-validation mechanism that ensures critique quality through correction outcomes. Implemented with Qwen2.5-72B-Instruct, one of the most powerful LLMs, SCRIT achieves up to a 10.3\% improvement on critique-correction and error identification benchmarks. Our analysis reveals that SCRIT's performance scales positively with data and model size, outperforms alternative approaches, and benefits critically from its self-validation component.

透過自我演化的評論者實現可擴展的監督

Enabling Scalable Oversight via Self-Evolving Critic

摘要

Summary

Support