Montessori-Instruct: 학생 학습에 맞춘 영향력 있는 훈련 데이터 생성

초록

합성 데이터는 대형 언어 모델을 훈련하는 데 널리 사용되어 왔지만, 그들의 생성적 특성은 불가피하게 노이즈, 비정보적인, 그리고 오도하는 학습 신호를 도입합니다. 본 논문에서는 학생 언어 모델의 학습 과정에 맞추어 교사 언어 모델의 데이터 합성 능력을 개선하는 혁신적인 데이터 합성 프레임워크인 Montessori-Instruct를 제안합니다. 구체적으로, 우리는 합성 훈련 데이터 포인트의 지역 데이터 영향을 활용하여 학생들의 학습 선호도를 특성화합니다. 그런 다음, 우리는 교사 모델을 직접 선호도 최적화(DPO)로 훈련하여 학생의 학습 선호도에 맞춘 합성 데이터를 생성합니다. Alpaca Eval 및 MT-Bench에서 Llama3-8B-Instruct(교사) 및 Llama3-8B(학생)로 수행한 실험 결과, Montessori-Instruct가 표준 합성 방법을 상대적으로 18.35% 및 46.24% 향상시킨 것을 보여줍니다. 우리의 방법은 더 강력한 교사 모델인 GPT-4o가 생성한 데이터를 능가합니다. 추가 분석은 교사의 학습이 학생의 향상된 학습을 위해 더 많은 영향력 있는 훈련 데이터를 생성하는 이점, 지역 데이터 영향이 학생 선호도를 정확하게 측정하는 데 미치는 이점, 그리고 Montessori-Instruct의 다양한 학생 모델에서의 견고성을 확인합니다. 우리의 코드와 데이터는 https://github.com/cxcscmu/Montessori-Instruct에서 오픈 소스로 제공됩니다.

English

Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

Montessori-Instruct: 학생 학습에 맞춘 영향력 있는 훈련 데이터 생성

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

초록

Summary

Support