잠재 추론을 활용한 테스트 시간 계산 규모 확장: 재귀적 심도 접근

초록

우리는 잠재 공간에서 추론하는 능력을 갖춘 테스트 시간 계산을 확장할 수 있는 혁신적인 언어 모델 구조를 연구합니다. 우리의 모델은 재귀 블록을 반복함으로써 작동하며, 테스트 시간에 임의의 깊이로 펼쳐집니다. 이는 토큰을 더 많이 생성하여 계산을 확장하는 주류 추론 모델과 대조적입니다. 사고 체인에 기반한 접근법과는 달리, 우리의 방법은 특별한 훈련 데이터가 필요하지 않으며, 작은 문맥 창과 함께 작동할 수 있으며, 단어로 쉽게 표현되지 않는 추론 유형을 포착할 수 있습니다. 우리는 개념 증명 모델을 35억 개의 매개변수와 8000억 개의 토큰으로 확장했습니다. 결과 모델은 추론 벤치마크에서 성능을 향상시킬 수 있음을 보여주며, 때로는 500억 개의 매개변수에 해당하는 계산 부하까지 극적으로 개선될 수 있음을 보여줍니다.

English

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

잠재 추론을 활용한 테스트 시간 계산 규모 확장: 재귀적 심도 접근

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

초록

Support