睡眠时间计算：超越测试时的推理扩展

摘要

扩展测试时计算已成为大型语言模型（LLMs）解决复杂问题的关键要素，但这也带来了高延迟和推理成本。我们引入了休眠时计算，使模型能够在查询提出之前离线“思考”上下文：通过预测用户可能提出的问题并预先计算有用信息，我们能够显著降低测试时的计算需求。为验证该方法的有效性，我们创建了两个推理任务的改进版本——状态保持型GSM-Symbolic和状态保持型AIME。研究发现，休眠时计算可将达到相同准确率所需的测试时计算量减少约5倍，在状态保持型GSM-Symbolic和状态保持型AIME上分别提升准确率最高达13%和18%。此外，我们提出了多查询GSM-Symbolic，它通过在每个上下文中包含多个相关查询来扩展GSM-Symbolic。利用多查询GSM-Symbolic，将休眠时计算分摊到同一上下文的相关查询上，可使每个查询的平均成本降低2.5倍。随后，我们进行了进一步分析，以了解休眠时计算何时最为有效，发现用户查询的可预测性与休眠时计算的效果高度相关。最后，我们通过案例研究，将休眠时计算应用于一个现实的自主软件工程任务中。

English

Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but comes with high latency and inference cost. We introduce sleep-time compute, which allows models to "think" offline about contexts before queries are presented: by anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time. To demonstrate the efficacy of our method, we create modified versions of two reasoning tasks - Stateful GSM-Symbolic and Stateful AIME. We find that sleep-time compute can reduce the amount of test-time compute needed to achieve the same accuracy by ~ 5x on Stateful GSM-Symbolic and Stateful AIME and that by scaling sleep-time compute we can further increase accuracy by up to 13% on Stateful GSM-Symbolic and 18% on Stateful AIME. Furthermore, we introduce Multi-Query GSM-Symbolic, which extends GSM-Symbolic by including multiple related queries per context. By amortizing sleep-time compute across related queries about the same context using Multi-Query GSM-Symbolic, we can decrease the average cost per query by 2.5x. We then conduct additional analysis to understand when sleep-time compute is most effective, finding the predictability of the user query to be well correlated with the efficacy of sleep-time compute. Finally, we conduct a case-study of applying sleep-time compute to a realistic agentic SWE task.

睡眠时间计算：超越测试时的推理扩展

Sleep-time Compute: Beyond Inference Scaling at Test-time

摘要

Summary

Support

Support