토큰 예산을 고려한 LLM 추론

초록

대규모 언어 모델(LLM)이 다양한 작업에서 뛰어날 수 있도록 하는 데 추론은 중요합니다. Chain-of-Thought (CoT) 추론과 같은 방법은 문제를 중간 단계로 분해하여 LLM의 성능을 향상시키지만, 이는 토큰 사용량을 크게 증가시켜 비용을 증가시킵니다. 현재 LLM의 추론 과정은 불필요하게 길다는 것을 발견하고, 적절한 토큰 예산을 프롬프트에 포함시킴으로써 압축할 수 있지만, 토큰 예산의 선택이 실제 압축 효과에 중요한 역할을 합니다. 이후 우리는 토큰 예산을 고려한 LLM 추론 프레임워크를 제안하는데, 이는 추론 복잡성에 기반하여 다른 문제에 대한 토큰 예산을 동적으로 추정하고 추정된 토큰 예산을 추론 과정을 안내하는 데 사용합니다. 실험 결과, 우리의 방법은 CoT 추론에서 토큰 비용을 효과적으로 줄이면서 성능 감소가 거의 없으며, LLM 추론에서 효율성과 정확성을 균형있게 제공하는 실용적인 해결책을 제공합니다. 코드: https://github.com/GeniusHTX/TALE.

English

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

토큰 예산을 고려한 LLM 추론

Token-Budget-Aware LLM Reasoning

초록

Summary

Support

Support