基於 Token 預算的 LLM 推理
Token-Budget-Aware LLM Reasoning
December 24, 2024
作者: Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, Zhenting Wang
cs.AI
摘要
推理對於大型語言模型(LLMs)在各種任務中表現卓越至關重要。雖然像「思維鏈」(CoT)這樣的方法通過將問題分解為中間步驟來增強LLM的性能,但也會產生大量的標記使用開銷,從而導致成本增加。我們發現當前LLM的推理過程過於冗長,可以通過在提示中包含合理的標記預算來進行壓縮,但標記預算的選擇在實際壓縮效果中起著至關重要的作用。因此,我們提出了一種標記預算感知的LLM推理框架,該框架根據推理複雜性動態估算不同問題的標記預算,並使用估算的標記預算來引導推理過程。實驗表明,我們的方法在僅略微降低性能的情況下有效地降低了CoT推理的標記成本,為在LLM推理中平衡效率和準確性提供了實用解決方案。程式碼:https://github.com/GeniusHTX/TALE。
English
Reasoning is critical for large language models (LLMs) to excel in a wide
range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM
performance by decomposing problems into intermediate steps, they also incur
significant overhead in token usage, leading to increased costs. We find that
the reasoning process of current LLMs is unnecessarily lengthy and it can be
compressed by including a reasonable token budget in the prompt, but the choice
of token budget plays a crucial role in the actual compression effectiveness.
We then propose a token-budget-aware LLM reasoning framework, which dynamically
estimates token budgets for different problems based on reasoning complexity
and uses the estimated token budgets to guide the reasoning process.
Experiments show that our method effectively reduces token costs in CoT
reasoning with only a slight performance reduction, offering a practical
solution to balance efficiency and accuracy in LLM reasoning. Code:
https://github.com/GeniusHTX/TALE.Summary
AI-Generated Summary