测试时间计算：从系统一思维到系统二思维

摘要

o1模型在复杂推理中表现出色，表明测试时间计算的扩展可以进一步释放模型的潜力，实现强大的系统二思维。然而，目前仍然缺乏针对测试时间计算扩展的综合调查。我们追溯测试时间计算的概念到系统一模型。在系统一模型中，测试时间计算通过参数更新、输入修改、表示编辑和输出校准来解决分布转移问题，提高鲁棒性和泛化能力。在系统二模型中，它通过重复采样、自我校正和树搜索来增强模型的推理能力，解决复杂问题。我们根据从系统一到系统二思维的趋势组织这项调查，突出测试时间计算在从系统一模型向弱系统二模型，再到强系统二模型的过渡中的关键作用。我们还指出了一些可能的未来方向。

English

The remarkable performance of the o1 model in complex reasoning demonstrates that test-time computing scaling can further unlock the model's potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time computing scaling. We trace the concept of test-time computing back to System-1 models. In System-1 models, test-time computing addresses distribution shifts and improves robustness and generalization through parameter updating, input modification, representation editing, and output calibration. In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search. We organize this survey according to the trend of System-1 to System-2 thinking, highlighting the key role of test-time computing in the transition from System-1 models to weak System-2 models, and then to strong System-2 models. We also point out a few possible future directions.

测试时间计算：从系统一思维到系统二思维

Test-time Computing: from System-1 Thinking to System-2 Thinking

摘要

Support