指令与推理数据如何塑造后训练:从层级梯度视角看数据质量
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
April 14, 2025
作者: Ming Li, Yanhong Li, Ziyue Li, Tianyi Zhou
cs.AI
摘要
随着大语言模型(LLMs)的后训练从指令跟随任务向复杂推理任务推进,理解不同数据如何影响微调动态仍是一个尚未深入探索的领域。本文中,我们针对LLM后训练中由低/高质量指令与推理数据引发的逐层梯度进行了谱分析。我们的分析揭示,广泛研究的数据评估指标,如IFD、InsTag、难度及奖励,均可通过梯度奇异值分解(SVD)计算出的谱特性得到解释与统一。具体而言,高质量数据通常与较低的核范数及较高的有效秩相关联。值得注意的是,在捕捉细微质量差异方面,有效秩展现出比核范数更好的鲁棒性与分辨率。例如,推理数据相较于指令数据实现了显著更高的有效秩,暗示了在更复杂任务上梯度结构更为丰富。我们的实验还表明,同一家族内的模型不论规模大小,其梯度模式均相似,而不同模型家族间则存在显著差异。本研究为跨指令与推理数据的数据质量影响提供了统一视角,阐明了数据质量与训练稳定性之间的相互作用,为开发更优的后训练数据探索策略提供了新颖见解。
English
As the post-training of large language models (LLMs) advances from
instruction-following to complex reasoning tasks, understanding how different
data affect finetuning dynamics remains largely unexplored. In this paper, we
present a spectral analysis of layer-wise gradients induced by low/high-quality
instruction and reasoning data for LLM post-training. Our analysis reveals that
widely-studied metrics for data evaluation, e.g., IFD, InsTag, Difficulty, and
Reward, can be explained and unified by spectral properties computed from
gradients' singular value decomposition (SVD). Specifically, higher-quality
data are usually associated with lower nuclear norms and higher effective
ranks. Notably, effective rank exhibits better robustness and resolution than
nuclear norm in capturing subtle quality differences. For example, reasoning
data achieves substantially higher effective ranks than instruction data,
implying richer gradient structures on more complex tasks. Our experiments also
highlight that models within the same family share similar gradient patterns
regardless of their sizes, whereas different model families diverge
significantly. Providing a unified view on the effects of data quality across
instruction and reasoning data, this work illuminates the interplay between
data quality and training stability, shedding novel insights into developing
better data exploration strategies for post-training.Summary
AI-Generated Summary