通过自我演进的评论者实现可扩展的监督
Enabling Scalable Oversight via Self-Evolving Critic
January 10, 2025
作者: Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
cs.AI
摘要
尽管大型语言模型(LLMs)表现出色,但其发展面临一个关键挑战,即可扩展的监督:为难以进行人工评估或LLMs胜过人类的任务提供有效反馈。尽管在利用LLMs进行批判方面存在越来越多的兴趣,但当前方法仍依赖于人类注释或更强大的模型,未解决在无外部监督下增强批判能力的问题。我们引入了SCRIT(Self-evolving CRITic),这是一个能够实现真正自我进化批判能力的框架。从技术上讲,SCRIT通过在合成数据上训练自我改进,这些数据是由基于对比的自我批评者生成的,该批评者使用参考解决方案进行逐步批判,并通过纠正结果确保批判质量的自我验证机制。SCRIT采用了Qwen2.5-72B-Instruct,这是最强大的LLMs之一,实现了对批判-纠正和错误识别基准的最多10.3\%改进。我们的分析表明,SCRIT的性能随着数据和模型规模的增加呈正向变化,优于替代方法,并且在很大程度上受益于其自我验证组件。
English
Despite their remarkable performance, the development of Large Language
Models (LLMs) faces a critical challenge in scalable oversight: providing
effective feedback for tasks where human evaluation is difficult or where LLMs
outperform humans. While there is growing interest in using LLMs for critique,
current approaches still rely on human annotations or more powerful models,
leaving the issue of enhancing critique capabilities without external
supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework
that enables genuine self-evolution of critique abilities. Technically, SCRIT
self-improves by training on synthetic data, generated by a contrastive-based
self-critic that uses reference solutions for step-by-step critique, and a
self-validation mechanism that ensures critique quality through correction
outcomes. Implemented with Qwen2.5-72B-Instruct, one of the most powerful LLMs,
SCRIT achieves up to a 10.3\% improvement on critique-correction and error
identification benchmarks. Our analysis reveals that SCRIT's performance scales
positively with data and model size, outperforms alternative approaches, and
benefits critically from its self-validation component.Summary
AI-Generated Summary