蒙特梭利教學:生成針對學生學習量身定制的具影響力的訓練數據
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
October 18, 2024
作者: Xiaochuan Li, Zichun Yu, Chenyan Xiong
cs.AI
摘要
合成數據已被廣泛用於訓練大型語言模型,但其生成性質不可避免地引入噪音、非信息性和誤導性的學習信號。本文提出了Montessori-Instruct,一個新穎的數據合成框架,該框架調整了教師語言模型的數據合成能力,以適應學生語言模型的學習過程。具體而言,我們利用合成訓練數據點對學生的本地數據影響來表徵學生的學習偏好。然後,我們使用直接偏好優化(DPO)來訓練教師模型,以生成符合學生學習偏好的合成數據。在Alpaca Eval和MT-Bench上使用Llama3-8B-Instruct(教師)和Llama3-8B(學生)進行實驗,結果顯示Montessori-Instruct的性能顯著優於標準合成方法,相對提高了18.35\%和46.24\%。我們的方法還優於由更強大的教師模型GPT-4o合成的數據。進一步的分析確認了教師學習對於生成更具影響力的訓練數據以促進學生改善學習的好處,本地數據影響在準確衡量學生偏好方面的優勢,以及Montessori-Instruct在不同學生模型中的穩健性。我們的代碼和數據已在https://github.com/cxcscmu/Montessori-Instruct 上開源。
English
Synthetic data has been widely used to train large language models, but their
generative nature inevitably introduces noisy, non-informative, and misleading
learning signals. In this paper, we propose Montessori-Instruct, a novel data
synthesis framework that tailors the data synthesis ability of the teacher
language model toward the student language model's learning process.
Specifically, we utilize local data influence of synthetic training data points
on students to characterize students' learning preferences. Then, we train the
teacher model with Direct Preference Optimization (DPO) to generate synthetic
data tailored toward student learning preferences. Experiments with
Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and
MT-Bench demonstrate that Montessori-Instruct significantly outperforms
standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also
beats data synthesized by a stronger teacher model, GPT-4o. Further analysis
confirms the benefits of teacher's learning to generate more influential
training data in the student's improved learning, the advantages of local data
influence in accurately measuring student preferences, and the robustness of
Montessori-Instruct across different student models. Our code and data are
open-sourced at https://github.com/cxcscmu/Montessori-Instruct.Summary
AI-Generated Summary