OpenAI o1 시스템 카드

초록

o1 모델 시리즈는 대규모 강화 학습을 통해 사고 체인을 사용하여 추론합니다. 이러한 고급 추론 능력은 모델의 안전성과 견고성을 향상시키는 새로운 방법을 제공합니다. 특히, 우리의 모델은 사고적 조정을 통해 잠재적으로 불안전한 프롬프트에 응답할 때 안전 정책에 대해 맥락에서 추론할 수 있습니다. 이는 불법적인 조언 생성, 고정적인 응답 선택, 알려진 탈옥에 빠지는 위험과 같은 특정 위험 벤치마크에서 최첨단 성능을 제공합니다. 답변하기 전에 사고 체인을 통합하는 모델을 훈련하는 것은 상당한 이점을 발휘할 수 있는 가능성을 가지고 있지만, 지능이 높아지면서 발생하는 잠재적 위험도 증가시킬 수 있습니다. 우리의 결과는 견고한 조정 방법을 구축하고, 그 효능을 철저하게 스트레스 테스트하며, 세심한 위험 관리 프로토콜을 유지하는 필요성을 강조합니다. 이 보고서는 OpenAI o1 및 OpenAI o1-mini 모델을 위해 수행된 안전 작업을 개요하며, 안전 평가, 외부 레드팀 평가, 그리고 준비 프레임워크 평가를 포함합니다.

English

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

OpenAI o1 시스템 카드

OpenAI o1 System Card

초록

Summary

Support

Support