ChatPaper.aiChatPaper

MM-Eureka:基于规则的大规模强化学习探索视觉顿悟时刻

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

March 10, 2025
作者: Fanqing Meng, Lingxiao Du, Zongkai Liu, Zhixiang Zhou, Quanfeng Lu, Daocheng Fu, Botian Shi, Wenhai Wang, Junjun He, Kaipeng Zhang, Ping Luo, Yu Qiao, Qiaosheng Zhang, Wenqi Shao
cs.AI

摘要

我们推出MM-Eureka,一种多模态推理模型,成功将大规模基于规则的强化学习(RL)扩展至多模态推理领域。尽管基于规则的RL在提升大语言模型(LLMs)于文本领域的推理能力方面已展现出显著成效,但其在多模态环境中的应用一直面临挑战。我们的工作在多模态空间中复现了如DeepSeek-R1等文本RL系统的关键特征,包括准确度奖励与响应长度的稳步提升,以及反思行为的涌现。我们证明,无论是经过指令调优还是预训练的模型,均能通过基于规则的RL发展出强大的多模态推理能力,无需监督微调,且相较于其他方法展现出更优的数据效率。为促进该领域的进一步研究,我们开源了完整的流程,包括所有代码、模型、数据等,发布在https://github.com/ModalMinds/MM-EUREKA。
English
We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale rule-based reinforcement learning (RL) to multimodal reasoning. While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space, including steady increases in accuracy reward and response length, and the emergence of reflection behaviors. We demonstrate that both instruction-tuned and pre-trained models can develop strong multimodal reasoning capabilities through rule-based RL without supervised fine-tuning, showing superior data efficiency compared to alternative approaches. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

Summary

AI-Generated Summary

PDF532March 11, 2025