基于粒子的蒙特卡洛方法的概率推断方法用于推理时的LLM缩放

摘要

大型语言模型（LLMs）通过增加模型大小和/或数据规模取得了显著的性能提升。然而，最近的证据表明，通过这种方法获得的收益递减，这促使我们考虑增加推断时所消耗的计算量。现有的推断时缩放方法通常使用奖励模型，将任务视为搜索问题，但由于奖励模型中的近似误差，往往容易受到奖励欺骗的影响。在本文中，我们将推断时缩放视为概率推断任务，并利用基于抽样的技术来探索具有近似似然的状态空间模型的状态分布的典型集合，而不是直接优化其模态。我们提出了一种新颖的推断时缩放方法，通过将基于粒子的蒙特卡洛方法应用于此任务。我们的实证评估表明，我们的方法在各种具有挑战性的数学推理任务上比我们的确定性搜索对照方法具有4-16倍更好的缩放速率。使用我们的方法，我们展示了Qwen2.5-Math-1.5B-Instruct在仅4次展开中就能超越GPT-4o的准确性，而Qwen2.5-Math-7B-Instruct仅在32次展开中就能达到o1级别的准确性。我们的工作不仅提出了一种有效的推断时缩放方法，还将概率推断中丰富的文献与LLMs的推断时缩放联系起来，以在未来的工作中开发更加稳健的算法。代码和更多信息可在https://probabilistic-inference-scaling.github.io获取。

English

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code and further information is available at https://probabilistic-inference-scaling.github.io.

基于粒子的蒙特卡洛方法的概率推断方法用于推理时的LLM缩放

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

摘要

Summary

Support