마르코-o1: 개방형 문제 해결을 위한 개방형 추론 모델로의 진화

초록

현재 OpenAI의 o1은 대규모 추론 모델(LRM) 연구에 대한 관심을 촉발시켰다. 이 흐름을 이어가는 Marco-o1은 수학, 물리학, 코딩과 같은 표준 답변이 있는 학문뿐만 아니라 강화 학습(RL)에 적합한 분야에 초점을 맞추며, 개방적인 해결책에 더 많은 중점을 둔다. 우리의 목표는 "o1 모델이 명확한 기준이 없고 보상을 측정하기 어려운 넓은 영역에 효과적으로 일반화할 수 있는가?"이다. Marco-o1은 Chain-of-Thought (CoT) 미세 조정, 몬테카를로 트리 탐색 (MCTS), 반성 메커니즘 및 혁신적인 추론 전략에 의해 구동되며, 복잡한 실세계 문제 해결 작업에 최적화되어 있다.

English

Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

마르코-o1: 개방형 문제 해결을 위한 개방형 추론 모델로의 진화

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

초록

Support