Marco-o1:朝向開放式問題解決的開放推理模型
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
November 21, 2024
作者: Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang
cs.AI
摘要
目前,OpenAI o1 引發了對大型推理模型(LRM)研究的高度興趣。在這股動力的推動下,Marco-o1 不僅專注於具有標準答案的學科,如數學、物理和編碼(適合強化學習(RL)),還更加強調開放式解決方案。我們的目標是回答以下問題:"o1 模型能否有效地應用於缺乏明確標準且難以量化獎勵的更廣泛領域?" Marco-o1 採用了Chain-of-Thought(CoT)微調、蒙特卡羅樹搜索(MCTS)、反思機制和創新的推理策略,這些策略經過優化,適用於複雜的現實世界問題解決任務。
English
Currently OpenAI o1 has sparked a surge of interest in the study of large
reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on
disciplines with standard answers, such as mathematics, physics, and coding --
which are well-suited for reinforcement learning (RL) -- but also places
greater emphasis on open-ended resolutions. We aim to address the question:
"Can the o1 model effectively generalize to broader domains where clear
standards are absent and rewards are challenging to quantify?" Marco-o1 is
powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS),
reflection mechanisms, and innovative reasoning strategies -- optimized for
complex real-world problem-solving tasks.Summary
AI-Generated Summary