基于令牌预算的LLM推理

摘要

推理对于大型语言模型（LLMs）在各种任务中表现出色至关重要。虽然像“思维链”（CoT）推理这样的方法通过将问题分解为中间步骤来增强LLM的性能，但它们也会导致显著的标记使用开销，从而增加成本。我们发现当前LLMs的推理过程不必要地冗长，可以通过在提示中包含合理的标记预算来压缩，但标记预算的选择在实际压缩效果中起着至关重要的作用。然后，我们提出了一个标记预算感知的LLM推理框架，该框架根据推理复杂性动态估计不同问题的标记预算，并使用估计的标记预算来指导推理过程。实验证明，我们的方法在减少CoT推理中的标记成本时，仅略微降低性能，为在LLM推理中平衡效率和准确性提供了实用解决方案。代码：https://github.com/GeniusHTX/TALE。

English

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

基于令牌预算的LLM推理

Token-Budget-Aware LLM Reasoning

摘要

Summary

Support