ChatPaper.aiChatPaper

LADDER:通过递归问题分解实现大语言模型的自我提升

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

March 2, 2025
作者: Toby Simonds, Akira Yoshiyama
cs.AI

摘要

我们提出了LADDER(通过自主难度驱动示例递归学习)框架,该框架使大型语言模型能够通过自我引导学习,递归生成并逐步解决复杂问题的简化变体,从而自主提升其问题解决能力。与以往需要精心策划数据集或人类反馈的方法不同,LADDER利用模型自身的能力生成更简单的题目变体。我们展示了LADDER在数学积分领域的有效性,将Llama 3.2 3B模型在本科水平问题上的准确率从1%提升至82%,并使得Qwen2.5 7B Deepseek-R1 Distilled模型在MIT积分大赛资格赛中达到73%的准确率。此外,我们引入了TTRL(测试时强化学习),在推理阶段对测试问题的变体进行强化学习。TTRL使Qwen2.5 7B Deepseek-R1 Distilled模型在MIT积分大赛资格赛中取得了90%的顶尖成绩,超越了OpenAI o1的表现。这些结果表明,自我导向的战略学习无需依赖架构扩展或人类监督,即可实现显著的能力提升。
English
We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

Summary

AI-Generated Summary

PDF182March 5, 2025