SelfCodeAlign：自我對齊以用於程式碼生成

摘要

指令調整是一種監督微調方法，顯著提高了大型語言模型（LLMs）遵循人類指令的能力。我們提出了SelfCodeAlign，這是第一個完全透明且允許的管道，用於自我對齊程式碼LLMs，無需大量人工標註或提煉。SelfCodeAlign在整個數據生成過程中使用相同的基礎模型進行推斷。它首先從高質量種子片段中提取多樣的編碼概念以生成新任務。然後對每個任務採樣多個回應，將每個回應與測試用例配對，並在沙箱環境中對其進行驗證。最後，選擇通過的示例進行指令調整。在我們的主要實驗中，我們使用SelfCodeAlign與CodeQwen1.5-7B生成了一個包含74k指令-回應對的數據集。在這個數據集上進行微調會使模型在HumanEval+上達到67.1的pass@1，超越了CodeLlama-70B-Instruct，儘管後者的大小只有前者的十分之一。在所有基準測試中，這個微調後的模型始終優於使用OctoPack訓練的原始版本，OctoPack是之前用於指令調整而無需人工標註或提煉的最先進方法。此外，我們展示了SelfCodeAlign對各種大小的LLMs都是有效的，從3B到33B，並且基礎模型可以更多地受益於與自身數據分佈的對齊。我們進一步驗證了我們管道中每個組件的有效性，顯示SelfCodeAlign優於直接從GPT-4o提煉以及領先的基於GPT-3.5的提煉方法，如OSS-Instruct和Evol-Instruct。SelfCodeAlign還促成了StarCoder2-Instruct的創建，這是第一個完全透明、授權寬鬆且自我對齊的程式碼LLM，實現了最先進的編碼性能。

English

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component's effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance.

SelfCodeAlign：自我對齊以用於程式碼生成

SelfCodeAlign: Self-Alignment for Code Generation

摘要

Summary

Support

Support