o1-Coder: 코딩을 위한 o1 복제입니다.

초록

기술 보고서는 코딩 작업에 초점을 맞춘 OpenAI의 o1 모델을 복제하려는 O1-CODER를 소개합니다. 이 모델은 강화 학습 (RL)과 몬테카를로 트리 탐색 (MCTS)을 통합하여 모델의 시스템-2 사고 능력을 향상시킵니다. 이 프레임워크에는 표준 코드 테스트를 위한 테스트 케이스 생성기 (TCG)를 훈련시키고, 추론 프로세스와 함께 코드 데이터를 생성하기 위해 MCTS를 사용하며, 정책 모델을 반복적으로 세밀하게 조정하여 먼저 의사 코드를 생성한 다음 전체 코드를 생성하는 것이 포함됩니다. 보고서는 또한 실제 응용 프로그램에서 o1과 유사한 모델을 배포하는 데 기회와 도전을 다루며, 시스템-2 패러다임으로의 전환을 제안하고 환경 상태 업데이트에 대한 필수성을 강조합니다. 업데이트된 모델 진행 상황과 실험 결과는 후속 버전에서 보고될 예정입니다. 모든 소스 코드, 선별된 데이터 세트 및 파생된 모델은 https://github.com/ADaM-BJTU/O1-CODER 에서 공개될 것입니다.

English

The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities. The framework includes training a Test Case Generator (TCG) for standardized code testing, using MCTS to generate code data with reasoning processes, and iteratively fine-tuning the policy model to initially produce pseudocode, followed by the generation of the full code. The report also addresses the opportunities and challenges in deploying o1-like models in real-world applications, suggesting transitioning to the System-2 paradigm and highlighting the imperative for environment state updates. Updated model progress and experimental results will be reported in subsequent versions. All source code, curated datasets, as well as the derived models will be disclosed at https://github.com/ADaM-BJTU/O1-CODER .

o1-Coder: 코딩을 위한 o1 복제입니다.

o1-Coder: an o1 Replication for Coding

초록

Support