ChatPaper.aiChatPaper

定理解释代理:面向大语言模型定理理解的多模态解释

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

February 26, 2025
作者: Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen
cs.AI

摘要

理解特定领域的定理通常不仅需要基于文本的推理;通过结构化视觉解释进行有效沟通对于深入理解至关重要。尽管大型语言模型(LLMs)在基于文本的定理推理方面表现出色,但其生成连贯且具有教学意义的视觉解释的能力仍是一个未解决的挑战。在本研究中,我们提出了TheoremExplainAgent,一种利用Manim动画生成长篇定理解释视频(超过5分钟)的代理方法。为了系统评估多模态定理解释,我们提出了TheoremExplainBench,一个涵盖多个STEM学科240个定理的基准,以及5个自动化评估指标。我们的结果表明,代理规划对于生成详细的长篇视频至关重要,o3-mini代理的成功率达到93.8%,总体得分为0.77。然而,我们的定量和定性研究表明,大多数生成的视频在视觉元素布局上存在轻微问题。此外,多模态解释揭示了基于文本的解释未能暴露的更深层次推理缺陷,凸显了多模态解释的重要性。
English
Understanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstrate strong performance in text-based theorem reasoning, their ability to generate coherent and pedagogically meaningful visual explanations remains an open challenge. In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. To systematically evaluate multimodal theorem explanations, we propose TheoremExplainBench, a benchmark covering 240 theorems across multiple STEM disciplines, along with 5 automated evaluation metrics. Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a success rate of 93.8% and an overall score of 0.77. However, our quantitative and qualitative studies show that most of the videos produced exhibit minor issues with visual element layout. Furthermore, multimodal explanations expose deeper reasoning flaws that text-based explanations fail to reveal, highlighting the importance of multimodal explanations.

Summary

AI-Generated Summary

PDF432February 27, 2025