ChatPaper.aiChatPaper

深度求索-R1 思维学:探索大语言模型的推理机制

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

April 2, 2025
作者: Sara Vera Marjanović, Arkil Patel, Vaibhav Adlakha, Milad Aghajohari, Parishad BehnamGhader, Mehar Bhatia, Aditi Khandelwal, Austin Kraft, Benno Krojer, Xing Han Lù, Nicholas Meade, Dongchan Shin, Amirhossein Kazemnejad, Gaurav Kamath, Marius Mosbach, Karolina Stańczak, Siva Reddy
cs.AI

摘要

如DeepSeek-R1这样的大型推理模型,标志着大型语言模型(LLMs)处理复杂问题方式的根本性转变。不同于直接为给定输入生成答案,DeepSeek-R1构建了详尽的多步骤推理链,仿佛在“思考”问题之后才给出解答。这一推理过程对用户公开,为研究模型的推理行为提供了无限可能,并开启了“思维学”这一新领域。从DeepSeek-R1推理基本构建模块的分类学出发,我们的分析探讨了思维长度的影响与可控性、长或混乱上下文的管理、文化及安全顾虑,以及DeepSeek-R1在类人语言处理和世界建模等认知现象中的地位。研究发现描绘了一幅细致入微的图景。特别地,我们揭示了DeepSeek-R1存在一个推理的“最佳点”,超过此点的额外推理时间反而会损害模型表现。此外,我们还发现DeepSeek-R1倾向于持续反思先前探索过的问题表述,阻碍了进一步的探索。同时,我们注意到,相较于非推理型模型,DeepSeek-R1存在显著的安全脆弱性,这也可能危及那些已进行安全对齐的大型语言模型。
English
Large Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an answer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly "thinking" about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1's basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-\`a-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a 'sweet spot' of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. We also note strong safety vulnerabilities of DeepSeek-R1 compared to its non-reasoning counterpart, which can also compromise safety-aligned LLMs.

Summary

AI-Generated Summary

PDF795April 11, 2025