효율적인 LLM 추론을 위한 보상 지도형 추측 디코딩

초록

우리는 대규모 언어 모델(LLM)에서 추론 효율성을 향상시키기 위한 혁신적인 프레임워크인 Reward-Guided Speculative Decoding (RSD)를 소개합니다. RSD는 가벼운 초안 모델과 강력한 목표 모델을 시너지적으로 결합하여 높은 보상을 우선시하는 제어된 편향을 통합합니다. 기존의 엄격한 무편향성을 강요하는 추론 디코딩 방법과 대조적으로 RSD는 중간 디코딩 단계를 평가하고 목표 모델을 호출할지 동적으로 결정하는 프로세스 보상 모델을 활용하여 계산 비용과 출력 품질 사이의 트레이드오프를 최적화합니다. 우리는 임계값 기반 혼합 전략이 자원 활용과 성능 사이의 최적 균형을 달성한다는 이론적 증명을 제시합니다. 올림피아드 수준의 과제를 포함한 어려운 추론 벤치마크에서의 광범위한 평가 결과, RSD가 목표 모델만 사용하는 디코딩에 비해 상당한 효율성 향상을 제공함을 보여줍니다(최대 4.4배의 FLOPs 절감), 동시 디코딩 방법보다 평균적으로 더 나은 정확도를 달성합니다(최대 +3.5). 이러한 결과는 RSD가 자원 집약적 시나리오에서 LLM을 배포하는 강력하고 비용 효율적인 접근 방식으로 부각시킨다.

English

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios.

효율적인 LLM 추론을 위한 보상 지도형 추측 디코딩

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

초록

Support