ChatPaper.aiChatPaper

反向思维使LLM更强大的推理者。

Reverse Thinking Makes LLMs Stronger Reasoners

November 29, 2024
作者: Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
cs.AI

摘要

反向思维在人类推理中发挥着至关重要的作用。人类不仅可以从问题推导出解决方案,还可以反其道而行之,即从解决方案出发逆向推理至问题。这通常会增强整体推理性能,因为它可以在正向和反向思维之间进行一致性检查。为了使大型语言模型(LLMs)能够进行反向思维,我们引入了一种名为Reverse-Enhanced Thinking(RevThink)的框架,由数据增强和学习目标组成。在RevThink中,我们通过从教师模型收集结构化的正向-反向推理来增强数据集,其中包括:(1)原始问题,(2)正向推理,(3)反向问题和(4)反向推理。然后,我们采用三个目标以多任务学习的方式训练一个较小的学生模型:(a)从问题生成正向推理,(b)从问题生成反向问题,以及(c)从反向问题生成反向推理。在涵盖常识、数学和逻辑推理的12个数据集上进行的实验表明,相较于学生模型的零-shot表现,我们的方法平均提高了13.53%,比最强的知识蒸馏基线提高了6.84%。此外,我们的方法表现出样本效率 - 仅利用训练数据中正确正向推理的10%,就能胜过在10倍更多正向推理上训练的标准微调方法。RevThink还展现出对分布之外的保留数据集的强大泛化能力。
English
Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning performance as it enables consistency checks between their forward and backward thinking. To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. In RevThink, we augment the dataset by collecting structured forward-backward reasoning from a teacher model, consisting of: (1) the original question, (2) forward reasoning, (3) backward question, and (4) backward reasoning. We then employ three objectives to train a smaller student model in a multi-task learning fashion: (a) generate forward reasoning from a question, (b) generate a backward question from a question, and (c) generate backward reasoning from the backward question. Experiments across 12 datasets covering commonsense, math, and logical reasoning show an average 13.53% improvement over the student model's zero-shot performance and a 6.84% improvement over the strongest knowledge distillation baselines. Moreover, our method demonstrates sample efficiency -- using only 10% of the correct forward reasoning from the training data, it outperforms a standard fine-tuning method trained on 10x more forward reasoning. RevThink also exhibits strong generalization to out-of-distribution held-out datasets.

Summary

AI-Generated Summary

PDF232December 2, 2024