耐心是大型语言模型推理的关键。

摘要

最近在大型语言模型领域取得的进展，特别是通过“思维链”（CoT）方法，已经展示出在解决复杂问题方面取得了显著的改进。然而，现有模型要么出于用户偏好而牺牲详细推理以追求简洁，要么需要大量昂贵的训练数据来学习复杂推理能力，从而限制了它们在解决复杂任务中的潜力。为了弥合这一差距，我们提出了一种简单的方法，遵循测试时间扩展的概念，鼓励模型采用更加耐心的推理风格，而无需引入新的知识或技能。通过采用偏好优化方法，我们生成详细的推理过程作为正例，简单的答案作为负例，从而训练模型偏好在其回答中的彻底性。我们的结果表明，在仅在轻量级数据集上训练的情况下，在GSM8k上的性能提高了高达6.7%。

English

Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.

耐心是大型语言模型推理的关键。

Patience Is The Key to Large Language Model Reasoning

摘要

Support