SelfCodeAlign: 코드 생성을 위한 자가 정렬

초록

지시 튜닝은 대형 언어 모델(Large Language Models, LLMs)의 능력을 크게 향상시키는 감독형 세밀 조정 방법입니다. 우리는 SelfCodeAlign을 제안합니다. 이는 인간 주석이나 증류(distillation) 없이 코드 LLMs를 완전히 투명하고 허용적으로 자기 정렬하는 파이프라인으로, 데이터 생성 과정 전체에서 동일한 기본 모델을 추론에 활용합니다. SelfCodeAlign은 먼저 고품질 초기 코드 스니펫에서 다양한 코딩 개념을 추출하여 새로운 작업을 생성합니다. 그런 다음 각 작업에 대해 여러 응답을 샘플링하고 각각을 테스트 케이스와 짝지어 검증합니다. 마지막으로 지시 튜닝을 위해 통과한 예제를 선택합니다. 주요 실험에서는 SelfCodeAlign을 CodeQwen1.5-7B와 함께 사용하여 74k 개의 지시-응답 쌍 데이터셋을 생성합니다. 이 데이터셋에 대한 세밀 조정은 HumanEval+에서 67.1 pass@1을 달성하여 CodeLlama-70B-Instruct를 10배 작은 크기임에도 불구하고 능가합니다. 모든 벤치마크에서 이 세밀 조정된 모델은 이전 최첨단 방법인 OctoPack으로 훈련된 원본 버전보다 우수한 성능을 지속적으로 보입니다. 또한 SelfCodeAlign이 3B에서 33B까지 다양한 크기의 LLMs에서 효과적임을 보여주며, 기본 모델이 자체 데이터 분포와 더 잘 일치하도록 정렬되는 이점을 얻을 수 있음을 보여줍니다. 우리는 또한 SelfCodeAlign의 각 구성 요소의 효과를 검증하여, GPT-4o로부터 직접 증류하는 방법과 OSS-Instruct 및 Evol-Instruct와 같은 주요 GPT-3.5 기반 증류 방법을 능가하는 SelfCodeAlign의 성능을 보여줍니다. SelfCodeAlign은 또한 최첨단 코딩 성능을 달성하는 최초의 완전히 투명하고 허용적으로 라이선스가 부여된 자기 정렬 코드 LLM인 StarCoder2-Instruct의 창조로 이어졌습니다.

English

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component's effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance.

SelfCodeAlign: 코드 생성을 위한 자가 정렬

SelfCodeAlign: Self-Alignment for Code Generation

초록

Support