利用过程奖励引导树搜索对大型语言模型进行集成，以实现更好的复杂推理

摘要

尽管大型语言模型近年来取得了显著进展，但开源模型在复杂推理任务上往往难以保持稳定的高性能。现有的集成方法，无论是在标记还是输出级别应用，都未能解决这些挑战。为此，我们提出了一种名为带蒙特卡洛树搜索的语言模型集成（LE-MCTS）的新框架，用于对语言模型进行过程级集成。LE-MCTS将使用语言模型集成的逐步推理构建为马尔可夫决策过程。在这个框架中，状态表示中间推理路径，而动作包括使用预定义池中选择的语言模型之一生成下一个推理步骤。在过程级奖励模型的指导下，LE-MCTS对不同语言模型生成的推理步骤进行树搜索，识别最准确的推理链。对五个数学推理基准的实验结果表明，我们的方法优于单一语言模型解码算法和语言模型集成方法。值得注意的是，LE-MCTS在MATH和MQA数据集上分别提高了3.6%和4.3%的性能，突显了其在解决复杂推理问题方面的有效性。

English

Despite recent advances in large language models, open-source models often struggle to consistently perform well on complex reasoning tasks. Existing ensemble methods, whether applied at the token or output levels, fail to address these challenges. In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. LE-MCTS formulates step-by-step reasoning with an ensemble of language models as a Markov decision process. In this framework, states represent intermediate reasoning paths, while actions consist of generating the next reasoning step using one of the language models selected from a predefined pool. Guided by a process-based reward model, LE-MCTS performs a tree search over the reasoning steps generated by different language models, identifying the most accurate reasoning chain. Experimental results on five mathematical reasoning benchmarks demonstrate that our approach outperforms both single language model decoding algorithms and language model ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the MATH and MQA datasets, respectively, highlighting its effectiveness in solving complex reasoning problems.

利用过程奖励引导树搜索对大型语言模型进行集成，以实现更好的复杂推理

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

摘要

Support