테스트 시간 계산: 시스템-1 사고에서 시스템-2 사고로

초록

복잡한 추론에서 o1 모델의 놀라운 성능은 테스트 시간 컴퓨팅 스케일링이 모델의 잠재력을 더욱 발휘할 수 있음을 보여주며 강력한 시스템-2 사고를 가능하게 합니다. 그러나 테스트 시간 컴퓨팅 스케일링에 대한 포괄적인 조사가 아직 부족합니다. 우리는 테스트 시간 컴퓨팅 개념을 시스템-1 모델로 거슬러 올라가봅니다. 시스템-1 모델에서 테스트 시간 컴퓨팅은 분포 변화를 다루고 매개변수 업데이팅, 입력 수정, 표현 편집, 출력 보정을 통해 견고성과 일반화를 향상시킵니다. 시스템-2 모델에서는 반복 샘플링, 자가 수정, 트리 탐색을 통해 복잡한 문제를 해결하기 위해 모델의 추론 능력을 향상시킵니다. 우리는 시스템-1에서 시스템-2 사고로의 추세에 따라 이 조사를 구성하며 시스템-1 모델에서 약한 시스템-2 모델, 그리고 강력한 시스템-2 모델로의 전환에서 테스트 시간 컴퓨팅의 핵심 역할을 강조합니다. 또한 몇 가지 가능한 미래 방향을 지적합니다.

English

The remarkable performance of the o1 model in complex reasoning demonstrates that test-time computing scaling can further unlock the model's potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time computing scaling. We trace the concept of test-time computing back to System-1 models. In System-1 models, test-time computing addresses distribution shifts and improves robustness and generalization through parameter updating, input modification, representation editing, and output calibration. In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search. We organize this survey according to the trend of System-1 to System-2 thinking, highlighting the key role of test-time computing in the transition from System-1 models to weak System-2 models, and then to strong System-2 models. We also point out a few possible future directions.

테스트 시간 계산: 시스템-1 사고에서 시스템-2 사고로

Test-time Computing: from System-1 Thinking to System-2 Thinking

초록

Support