思维图景：大型语言模型推理过程的可视化

摘要

大型语言模型（LLMs）的众多应用依赖于其执行逐步推理的能力。然而，LLMs的推理行为仍鲜为人知，这给研究、开发及安全性带来了挑战。为填补这一空白，我们引入了“思维景观”——首个可视化工具，使用户能够检查链式思维及其衍生方法在任何多选题数据集上的推理路径。具体而言，我们将推理路径中的状态表示为特征向量，量化其与所有答案选项的距离。随后，利用t-SNE技术将这些特征在二维图中可视化。通过“思维景观”的定性与定量分析，能有效区分强弱模型、正误答案以及不同的推理任务，并揭示不良推理模式，如低一致性和高不确定性。此外，用户可调整我们的工具以适应预测所观察属性的模型。我们通过将该工具适配于一个轻量级验证器来展示这一优势，该验证器用于评估推理路径的正确性。代码已公开于：https://github.com/tmlr-group/landscape-of-thoughts。

English

Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts-the first visualization tool for users to inspect the reasoning paths of chain-of-thought and its derivatives on any multi-choice dataset. Specifically, we represent the states in a reasoning path as feature vectors that quantify their distances to all answer choices. These features are then visualized in two-dimensional plots using t-SNE. Qualitative and quantitative analysis with the landscape of thoughts effectively distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks. It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty. Additionally, users can adapt our tool to a model that predicts the property they observe. We showcase this advantage by adapting our tool to a lightweight verifier that evaluates the correctness of reasoning paths. The code is publicly available at: https://github.com/tmlr-group/landscape-of-thoughts.

思维图景：大型语言模型推理过程的可视化

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

摘要

Summary

Support

Support