OpenAI o1 システムカード

要旨

o1モデルシリーズは、大規模な強化学習を用いて推論する能力を持つようにトレーニングされています。これらの高度な推論能力は、モデルの安全性と堅牢性を向上させる新たな手段を提供します。特に、私たちのモデルは、潜在的に安全でないプロンプトに対応する際に、熟考的な整合性を通じて安全ポリシーについて推論することができます。これにより、違法なアドバイスの生成、ステレオタイプな応答の選択、既知のジェイルブレイクに陥るリスクなどの特定のベンチマークにおいて最先端のパフォーマンスが実現されます。回答する前に思考の連鎖を組み込むようモデルをトレーニングすることは、大きな利点を開放する可能性がありますが、知能の高まりから生じる潜在的なリスクも増加させることになります。私たちの結果は、堅牢な整合性手法の構築、その有効性の徹底的なストレステスト、および細心のリスク管理プロトコルの維持の必要性を強調しています。この報告書では、OpenAI o1およびOpenAI o1-miniモデルに対する実施された安全作業、安全性評価、外部レッドチームによるテスト、および準備フレームワークの評価について概説しています。

English

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

OpenAI o1 システムカード

OpenAI o1 System Card

要旨

Summary

Support

Support