耐心是大型語言模型推理的關鍵。

Patience Is The Key to Large Language Model Reasoning

November 20, 2024
作者: Yijiong Yu
cs.AI

摘要

最近在大型語言模型領域的進展,特別是通過Chain of Thought (CoT)方法,已經展示出在解決複雜問題方面的顯著改進。然而,現有模型要麼為了用戶偏好而犧牲詳細推理,要麼需要大量昂貴的訓練數據來學習複雜推理能力,這限制了它們在解決複雜任務方面的潛力。為了彌合這一差距,我們遵循了測試時間擴展的概念,提出了一種簡單的方法,鼓勵模型採用更耐心的推理風格,而無需引入新知識或技能。通過採用偏好優化方法,我們生成詳細的推理過程作為正例,簡單答案作為負例,從而訓練模型偏好在其回答中的徹底性。我們的結果表明,在僅在輕量級數據集上進行訓練的情況下,在GSM8k上的性能提高了高達6.7%。
English
Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.

Summary

AI-Generated Summary

PDF73November 22, 2024