指令與推理數據如何塑造後訓練:從層級梯度視角看數據質量
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
April 14, 2025
作者: Ming Li, Yanhong Li, Ziyue Li, Tianyi Zhou
cs.AI
摘要
隨著大型語言模型(LLM)的後訓練從指令跟隨任務進展到複雜推理任務,理解不同數據如何影響微調動態仍是一個未被充分探索的領域。本文中,我們對低/高質量指令和推理數據在LLM後訓練中誘導的層級梯度進行了譜分析。我們的分析揭示,廣泛研究的數據評估指標,如IFD、InsTag、難度和獎勵,可以通過從梯度奇異值分解(SVD)計算出的譜特性來解釋和統一。具體而言,高質量數據通常與較低的核範數和較高的有效秩相關。值得注意的是,在捕捉細微質量差異方面,有效秩比核範數表現出更好的魯棒性和分辨率。例如,推理數據的有效秩顯著高於指令數據,這意味著在更複雜的任務上梯度結構更為豐富。我們的實驗還強調,同一家族內的模型無論大小如何,其梯度模式都相似,而不同模型家族之間則存在顯著差異。通過提供一個關於指令和推理數據質量影響的統一視角,這項工作闡明了數據質量與訓練穩定性之間的相互作用,為開發更好的後訓練數據探索策略提供了新的見解。
English
As the post-training of large language models (LLMs) advances from
instruction-following to complex reasoning tasks, understanding how different
data affect finetuning dynamics remains largely unexplored. In this paper, we
present a spectral analysis of layer-wise gradients induced by low/high-quality
instruction and reasoning data for LLM post-training. Our analysis reveals that
widely-studied metrics for data evaluation, e.g., IFD, InsTag, Difficulty, and
Reward, can be explained and unified by spectral properties computed from
gradients' singular value decomposition (SVD). Specifically, higher-quality
data are usually associated with lower nuclear norms and higher effective
ranks. Notably, effective rank exhibits better robustness and resolution than
nuclear norm in capturing subtle quality differences. For example, reasoning
data achieves substantially higher effective ranks than instruction data,
implying richer gradient structures on more complex tasks. Our experiments also
highlight that models within the same family share similar gradient patterns
regardless of their sizes, whereas different model families diverge
significantly. Providing a unified view on the effects of data quality across
instruction and reasoning data, this work illuminates the interplay between
data quality and training stability, shedding novel insights into developing
better data exploration strategies for post-training.Summary
AI-Generated Summary