逆向思考使LLM更強大的推理者。
Reverse Thinking Makes LLMs Stronger Reasoners
November 29, 2024
作者: Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
cs.AI
摘要
反向思考在人類推理中扮演著至關重要的角色。人類不僅可以從問題到解決方案進行推理,還可以反向思考,即從解決方案開始並朝著問題進行推理。這通常能提升整體推理表現,因為它使得前向和後向思維之間的一致性檢查成為可能。為了讓大型語言模型(LLMs)能夠進行反向思考,我們引入了一個稱為Reverse-Enhanced Thinking(RevThink)的框架,由資料擴增和學習目標組成。在RevThink中,我們通過從教師模型中收集結構化的前向-後向推理來擴增數據集,該教師模型包括:(1)原始問題,(2)前向推理,(3)後向問題和(4)後向推理。然後,我們使用三個目標來以多任務學習的方式訓練一個較小的學生模型:(a)從問題生成前向推理,(b)從問題生成後向問題,以及(c)從後向問題生成後向推理。在涵蓋常識、數學和邏輯推理的12個數據集上進行的實驗表明,相對於學生模型的零-shot表現,我們的方法平均提高了13.53%,並且比最強的知識蒸餾基線模型提高了6.84%。此外,我們的方法展現了樣本效率 - 僅使用訓練數據中正確前向推理的10%,就優於在10倍更多前向推理上訓練的標準微調方法。RevThink還展現了對於分布之外的保留數據集的強大泛化能力。
English
Reverse thinking plays a crucial role in human reasoning. Humans can reason
not only from a problem to a solution but also in reverse, i.e., start from the
solution and reason towards the problem. This often enhances overall reasoning
performance as it enables consistency checks between their forward and backward
thinking. To enable Large Language Models (LLMs) to perform reverse thinking,
we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data
augmentation and learning objectives. In RevThink, we augment the dataset by
collecting structured forward-backward reasoning from a teacher model,
consisting of: (1) the original question, (2) forward reasoning, (3) backward
question, and (4) backward reasoning. We then employ three objectives to train
a smaller student model in a multi-task learning fashion: (a) generate forward
reasoning from a question, (b) generate a backward question from a question,
and (c) generate backward reasoning from the backward question. Experiments
across 12 datasets covering commonsense, math, and logical reasoning show an
average 13.53% improvement over the student model's zero-shot performance and a
6.84% improvement over the strongest knowledge distillation baselines.
Moreover, our method demonstrates sample efficiency -- using only 10% of the
correct forward reasoning from the training data, it outperforms a standard
fine-tuning method trained on 10x more forward reasoning. RevThink also
exhibits strong generalization to out-of-distribution held-out datasets.Summary
AI-Generated Summary