제약 역번역은 대규모 언어 모델의 복잡한 지시 사항 준수를 향상시킨다.

초록

대형 언어 모델(LLMs)은 형식, 길이 등의 복잡한 제약 조건을 따르는 데 어려움을 겪습니다. 기존의 지시-조정 방법을 따라, 이전 연구들은 복잡한 지시를 고급 LLM에 공급하여 생성된 복잡한 지시-응답 쌍에 대해 사후 훈련을 실시했습니다. 그러나 심지어 고급 LLM도 복잡한 지시를 잘 따르지 못하기 때문에 생성된 데이터의 품질이 제한됩니다. 본 연구에서는 기존 데이터셋이 내재적으로 복잡한 제약 조건을 포함하고 있음을 발견하고, 새로운 데이터 생성 기술인 제약 조건 역번역을 제안합니다. 구체적으로, 기존 데이터셋의 고품질 지시-응답 쌍을 채택하고, 응답이 이미 지시에 충족되는 복잡한 제약 조건을 추가하기 위해 고급 LLM만 사용하여 비용과 데이터 잡음을 자연스럽게 줄입니다. 실험에서는 Llama3-70B-Instruct를 사용하여 제약 조건을 역번역하고, CRAB라는 고품질 복잡한 지시-응답 데이터셋을 생성합니다. CRAB에 대한 사후 훈련이 다양한 백본 LLM의 복잡한 지시 따르기 능력을 향상시킨다는 것을 제시하며, 다양한 지시 따르기 벤치마크에서 평가합니다. 또한 제약 조건 역번역이 사후 훈련에서 유용한 보조 훈련 목표로 작용한다는 것을 발견합니다. 우리의 코드, 데이터 및 모델은 향후 연구를 용이하게 하기 위해 공개될 예정입니다.

English

Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.

제약 역번역은 대규모 언어 모델의 복잡한 지시 사항 준수를 향상시킨다.

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

초록

Support