约束反向翻译提高大型语言模型的复杂指令遵循

摘要

大型语言模型（LLMs）在遵循具有复杂约束条件的指令（如格式、长度等）方面存在困难。根据传统的指令调整实践，先前的研究通过将复杂指令输入到先进的LLMs中生成复杂指令-响应对，然后进行后训练。然而，即使是先进的LLMs也无法很好地遵循复杂指令，从而限制了生成数据的质量。在本研究中，我们发现现有数据集本质上包含隐含的复杂约束条件，并提出了一种新颖的数据生成技术，约束反向翻译。具体而言，我们采用现有数据集中的高质量指令-响应对，并仅采用先进的LLMs向指令添加响应已满足的复杂约束条件，从而自然降低成本和数据噪音。在实验中，我们采用Llama3-70B-Instruct来反向翻译约束并创建一个高质量的复杂指令-响应数据集，命名为CRAB。我们展示了在CRAB上进行后训练可以提高多个骨干LLMs的复杂指令遵循能力，评估了广泛的指令遵循基准。我们进一步发现，约束反向翻译也可以作为后训练中有用的辅助训练目标。我们将发布代码、数据和模型以促进未来研究。

English

Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.

约束反向翻译提高大型语言模型的复杂指令遵循

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

摘要

Summary

Support

Support