Marco-o1：面向开放式解决方案的开放式推理模型

摘要

目前，OpenAI o1 在大推理模型（LRM）研究领域引起了广泛关注。借助这股势头，Marco-o1 不仅专注于数学、物理和编码等具有标准答案的学科，这些学科非常适合强化学习（RL），而且更加注重开放式解决方案。我们的目标是回答这个问题：“o1 模型能否有效地推广到标准不明确、奖励难以量化的更广泛领域？” Marco-o1 采用了Chain-of-Thought（CoT）微调、蒙特卡洛树搜索（MCTS）、反思机制和创新推理策略，针对复杂的现实世界问题解决任务进行了优化。

English

Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

Marco-o1：面向开放式解决方案的开放式推理模型

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

摘要

Summary

Support