思维图景:大型语言模型推理过程的可视化
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
March 28, 2025
作者: Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han
cs.AI
摘要
大型语言模型(LLMs)的众多应用依赖于其执行逐步推理的能力。然而,LLMs的推理行为仍鲜为人知,这给研究、开发及安全性带来了挑战。为填补这一空白,我们引入了“思维景观”——首个可视化工具,使用户能够检查链式思维及其衍生方法在任何多选题数据集上的推理路径。具体而言,我们将推理路径中的状态表示为特征向量,量化其与所有答案选项的距离。随后,利用t-SNE技术将这些特征在二维图中可视化。通过“思维景观”的定性与定量分析,能有效区分强弱模型、正误答案以及不同的推理任务,并揭示不良推理模式,如低一致性和高不确定性。此外,用户可调整我们的工具以适应预测所观察属性的模型。我们通过将该工具适配于一个轻量级验证器来展示这一优势,该验证器用于评估推理路径的正确性。代码已公开于:https://github.com/tmlr-group/landscape-of-thoughts。
English
Numerous applications of large language models (LLMs) rely on their ability
to perform step-by-step reasoning. However, the reasoning behavior of LLMs
remains poorly understood, posing challenges to research, development, and
safety. To address this gap, we introduce landscape of thoughts-the first
visualization tool for users to inspect the reasoning paths of chain-of-thought
and its derivatives on any multi-choice dataset. Specifically, we represent the
states in a reasoning path as feature vectors that quantify their distances to
all answer choices. These features are then visualized in two-dimensional plots
using t-SNE. Qualitative and quantitative analysis with the landscape of
thoughts effectively distinguishes between strong and weak models, correct and
incorrect answers, as well as different reasoning tasks. It also uncovers
undesirable reasoning patterns, such as low consistency and high uncertainty.
Additionally, users can adapt our tool to a model that predicts the property
they observe. We showcase this advantage by adapting our tool to a lightweight
verifier that evaluates the correctness of reasoning paths. The code is
publicly available at: https://github.com/tmlr-group/landscape-of-thoughts.Summary
AI-Generated Summary