基于奖励引导的推测解码，用于高效的LLM推理

摘要

我们介绍了奖励引导的推测解码（RSD），这是一个旨在提高大型语言模型（LLMs）推理效率的新框架。RSD将一个轻量级草稿模型与一个更强大的目标模型协同结合，引入受控偏差以优先考虑高奖励输出，与现有的强制无偏解码方法形成对比。RSD利用一个过程奖励模型评估中间解码步骤，并动态决定是否调用目标模型，优化计算成本和输出质量之间的权衡。我们在理论上证明了基于阈值的混合策略实现了资源利用和性能之间的最佳平衡。对具有挑战性的推理基准测试进行了广泛评估，包括奥林匹克级任务，结果显示RSD相比仅使用目标模型解码可以实现显著的效率提升（FLOPs减少高达4.4倍），同时平均实现比并行解码方法更高的准确性（高达+3.5）。这些结果突显了RSD作为在资源密集型场景中部署LLMs的一种强大且具有成本效益的方法。

English

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios.

基于奖励引导的推测解码，用于高效的LLM推理

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

摘要

Summary

Support