바이춘 정렬 기술 보고서

초록

배천 정렬(Baichuan Alignment)을 소개합니다. 이는 Baichuan 시리즈 모델에서 사용된 정렬 기술의 상세한 분석입니다. 이는 산업에서 처음으로 정렬 방법론에 대한 포괄적인 설명을 제공하며, AI 연구를 발전시키는 데 유용한 통찰을 제공합니다. 우리는 정렬 프로세스 중 모델 성능을 향상시키는 중요 구성 요소들을 조사합니다. 최적화 방법, 데이터 전략, 능력 향상, 그리고 평가 프로세스를 포함합니다. 이 프로세스는 Prompt Augmentation System (PAS), Supervised Fine-Tuning (SFT), 그리고 Preference Alignment의 세 가지 주요 단계로 이루어집니다. 마주한 문제, 적용된 해결책, 그리고 이루어진 개선 사항이 철저히 기록되어 있습니다. 잘 알려진 벤치마크를 통한 비교를 통해, 배천 정렬이 가능케 한 기술적 진보를 강조합니다. Baichuan-Instruct는 내부 모델이며, Qwen2-Nova-72B와 Llama3-PBM-Nova-70B는 Baichuan Alignment를 통해 최적화된 Qwen2-72B와 Llama-3-70B의 instruct 버전입니다. Baichuan-Instruct는 핵심 능력에서 상당한 향상을 보여주며, 사용자 경험 향상은 17%에서 28%로 범위가 확대되었으며, 전문 벤치마크에서 우수한 성과를 거두었습니다. 오픈 소스 벤치마크 평가에서, Qwen2-Nova-72B와 Llama3-PBM-Nova-70B는 거의 모든 데이터셋에서 각각의 공식 instruct 버전을 일관되게 능가합니다. 이 보고서는 커뮤니티 내에서 정렬 프로세스 뒤에 있는 주요 기술을 명확히 하고, 깊은 이해를 촉진하는 것을 목표로 합니다. Llama3-PBM-Nova-70B 모델은 다음 링크에서 이용 가능합니다: https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B.

English

We introduce Baichuan Alignment, a detailed analysis of the alignment techniques employed in the Baichuan series of models. This represents the industry's first comprehensive account of alignment methodologies, offering valuable insights for advancing AI research. We investigate the critical components that enhance model performance during the alignment process, including optimization methods, data strategies, capability enhancements, and evaluation processes. The process spans three key stages: Prompt Augmentation System (PAS), Supervised Fine-Tuning (SFT), and Preference Alignment. The problems encountered, the solutions applied, and the improvements made are thoroughly recorded. Through comparisons across well-established benchmarks, we highlight the technological advancements enabled by Baichuan Alignment. Baichuan-Instruct is an internal model, while Qwen2-Nova-72B and Llama3-PBM-Nova-70B are instruct versions of the Qwen2-72B and Llama-3-70B base models, optimized through Baichuan Alignment. Baichuan-Instruct demonstrates significant improvements in core capabilities, with user experience gains ranging from 17% to 28%, and performs exceptionally well on specialized benchmarks. In open-source benchmark evaluations, both Qwen2-Nova-72B and Llama3-PBM-Nova-70B consistently outperform their respective official instruct versions across nearly all datasets. This report aims to clarify the key technologies behind the alignment process, fostering a deeper understanding within the community. Llama3-PBM-Nova-70B model is available at https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B.

바이춘 정렬 기술 보고서

Baichuan Alignment Technical Report

초록

Summary

Support