Marco-o1：朝向開放式問題解決的開放推理模型

摘要

目前，OpenAI o1 引發了對大型推理模型（LRM）研究的高度興趣。在這股動力的推動下，Marco-o1 不僅專注於具有標準答案的學科，如數學、物理和編碼（適合強化學習（RL）），還更加強調開放式解決方案。我們的目標是回答以下問題："o1 模型能否有效地應用於缺乏明確標準且難以量化獎勵的更廣泛領域？" Marco-o1 採用了Chain-of-Thought（CoT）微調、蒙特卡羅樹搜索（MCTS）、反思機制和創新的推理策略，這些策略經過優化，適用於複雜的現實世界問題解決任務。

English

Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

Marco-o1：朝向開放式問題解決的開放推理模型

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

摘要

Summary

熱門論文

1比特LLM時代：所有大型語言模型都在1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

DeepSeek-R1：通過強化學習激勵LLM中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Qwen2.5 技術報告
Qwen2.5 Technical Report

Support

摘要

Summary

熱門論文

1比特LLM時代：所有大型語言模型都在1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

DeepSeek-R1：通過強化學習激勵LLM中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Qwen2.5 技術報告Qwen2.5 Technical Report

1比特LLM時代：所有大型語言模型都在1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

DeepSeek-R1：通過強化學習激勵LLM中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Qwen2.5 技術報告
Qwen2.5 Technical Report