Marco-o1:面向开放式解决方案的开放式推理模型

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

November 21, 2024
作者: Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang
cs.AI

摘要

目前,OpenAI o1 在大推理模型(LRM)研究领域引起了广泛关注。借助这股势头,Marco-o1 不仅专注于数学、物理和编码等具有标准答案的学科,这些学科非常适合强化学习(RL),而且更加注重开放式解决方案。我们的目标是回答这个问题:“o1 模型能否有效地推广到标准不明确、奖励难以量化的更广泛领域?” Marco-o1 采用了Chain-of-Thought(CoT)微调、蒙特卡洛树搜索(MCTS)、反思机制和创新推理策略,针对复杂的现实世界问题解决任务进行了优化。
English
Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

Summary

AI-Generated Summary

PDF293November 22, 2024