URSA:理解和验证多模式数学中的思维链推理
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
January 8, 2025
作者: Ruilin Luo, Zhuofan Zheng, Yifan Wang, Yiyao Yu, Xinzhe Ni, Zicheng Lin, Jin Zeng, Yujiu Yang
cs.AI
摘要
链式推理(CoT)已被广泛应用于大型语言模型(LLMs)的数学推理中。最近,在CoT轨迹上引入导数过程监督引发了关于在测试时增强扩展能力的讨论,从而提升了这些模型的潜力。然而,在多模态数学推理中,高质量CoT训练数据的稀缺阻碍了现有模型实现高精度的CoT推理,并限制了测试时推理潜力的实现。在这项工作中,我们提出了一个三模块综合策略,将CoT蒸馏、轨迹格式重写和格式统一集成在一起。这导致了一个高质量的多模态数学CoT推理指导微调数据集MMathCoT-1M。我们全面验证了经过训练的URSA-7B模型在多个多模态数学基准测试上的最新技术(SOTA)性能。对于测试时扩展,我们引入了一个数据合成策略,自动生成名为DualMath-1.1M的过程注释数据集,重点放在解释和逻辑上。通过在DualMath-1.1M上进一步训练URSA-7B,我们从CoT推理能力过渡到强大的监督能力。经过训练的URSA-RM-7B作为验证器,有效提升了URSA-7B在测试时的性能。URSA-RM-7B还展示了出色的超出分布(OOD)验证能力,展示了其泛化能力。模型权重、训练数据和代码将开放源代码。
English
Chain-of-thought (CoT) reasoning has been widely applied in the mathematical
reasoning of Large Language Models (LLMs). Recently, the introduction of
derivative process supervision on CoT trajectories has sparked discussions on
enhancing scaling capabilities during test time, thereby boosting the potential
of these models. However, in multimodal mathematical reasoning, the scarcity of
high-quality CoT training data has hindered existing models from achieving
high-precision CoT reasoning and has limited the realization of reasoning
potential during test time. In this work, we propose a three-module synthesis
strategy that integrates CoT distillation, trajectory-format rewriting, and
format unification. It results in a high-quality CoT reasoning instruction
fine-tuning dataset in multimodal mathematics, MMathCoT-1M. We comprehensively
validate the state-of-the-art (SOTA) performance of the trained URSA-7B model
on multiple multimodal mathematical benchmarks. For test-time scaling, we
introduce a data synthesis strategy that automatically generates process
annotation datasets, known as DualMath-1.1M, focusing on both interpretation
and logic. By further training URSA-7B on DualMath-1.1M, we transition from CoT
reasoning capabilities to robust supervision abilities. The trained URSA-RM-7B
acts as a verifier, effectively enhancing the performance of URSA-7B at test
time. URSA-RM-7B also demonstrates excellent out-of-distribution (OOD)
verifying capabilities, showcasing its generalization. Model weights, training
data and code will be open-sourced.Summary
AI-Generated Summary