基於 Token 預算的 LLM 推理

摘要

推理對於大型語言模型（LLMs）在各種任務中表現卓越至關重要。雖然像「思維鏈」（CoT）這樣的方法通過將問題分解為中間步驟來增強LLM的性能，但也會產生大量的標記使用開銷，從而導致成本增加。我們發現當前LLM的推理過程過於冗長，可以通過在提示中包含合理的標記預算來進行壓縮，但標記預算的選擇在實際壓縮效果中起著至關重要的作用。因此，我們提出了一種標記預算感知的LLM推理框架，該框架根據推理複雜性動態估算不同問題的標記預算，並使用估算的標記預算來引導推理過程。實驗表明，我們的方法在僅略微降低性能的情況下有效地降低了CoT推理的標記成本，為在LLM推理中平衡效率和準確性提供了實用解決方案。程式碼：https://github.com/GeniusHTX/TALE。

English

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

基於 Token 預算的 LLM 推理

Token-Budget-Aware LLM Reasoning

摘要

Support