SelfCodeAlign：用于代码生成的自对齐

摘要

指令调优是一种监督微调方法，显著提高了大型语言模型（LLMs）遵循人类指令的能力。我们提出了SelfCodeAlign，这是第一个完全透明且允许的自我对齐流程，无需大量人工注释或蒸馏即可对齐代码LLMs。SelfCodeAlign在整个数据生成过程中都使用相同的基础模型进行推断。它首先从高质量种子代码片段中提取多样的编码概念以生成新任务。然后对每个任务采样多个响应，将每个响应与测试用例配对，并在沙盒环境中验证它们。最后，选取通过的示例进行指令调优。在我们的主要实验中，我们使用SelfCodeAlign与CodeQwen1.5-7B生成了一个包含74k指令-响应对的数据集。在这个数据集上微调会得到一个模型，在HumanEval+上达到了67.1的pass@1，超过了CodeLlama-70B-Instruct，尽管规模小了十倍。在所有基准测试中，这个微调模型始终优于使用OctoPack训练的原始版本，OctoPack是之前用于指令调优而无需人工注释或蒸馏的最先进方法。此外，我们展示了SelfCodeAlign在各种规模的LLMs上都是有效的，从3B到33B，并且基础模型可以更多地从与其自身数据分布的对齐中受益。我们进一步验证了我们流程中每个组件的有效性，表明SelfCodeAlign优于直接从GPT-4o蒸馏以及领先的基于GPT-3.5的蒸馏方法，如OSS-Instruct和Evol-Instruct。SelfCodeAlign还导致了StarCoder2-Instruct的创建，这是第一个完全透明、许可宽松且自我对齐的代码LLM，实现了最先进的编码性能。

English

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component's effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance.

SelfCodeAlign：用于代码生成的自对齐

SelfCodeAlign: Self-Alignment for Code Generation

摘要

Summary

Support

Support