ZClip：面向大语言模型预训练的自适应峰值抑制技术

摘要

训练大型语言模型（LLMs）面临诸多挑战，其中包括梯度不稳定性和损失值突增。这些现象可能导致灾难性发散，需要昂贵的检查点恢复和数据批次跳过操作。传统的梯度裁剪技术，如固定值或基于范数的方法，由于依赖固定阈值或启发式规则，无法有效解决这些问题，导致学习效率低下且需要频繁的人工干预。在本研究中，我们提出了ZClip，一种自适应梯度裁剪算法，它根据梯度范数随时间的统计特性动态调整裁剪阈值。与以往的反应式策略不同，ZClip无需对梯度范数的规模及时间演变做出任何先验假设，便能主动适应训练动态。其核心在于利用基于z分数的异常检测来识别并缓解大幅梯度突增，从而防止恶性损失值突增，同时不影响模型的正常收敛。我们的代码已公开于：https://github.com/bluorion-com/ZClip。

English

Training large language models (LLMs) presents numerous challenges, including gradient instability and loss spikes. These phenomena can lead to catastrophic divergence, requiring costly checkpoint restoration and data batch skipping. Traditional gradient clipping techniques, such as constant or norm-based methods, fail to address these issues effectively due to their reliance on fixed thresholds or heuristics, leading to inefficient learning and requiring frequent manual intervention. In this work, we propose ZClip, an adaptive gradient clipping algorithm that dynamically adjusts the clipping threshold based on statistical properties of gradient norms over time. Unlike prior reactive strategies, ZClip proactively adapts to training dynamics without making any prior assumptions on the scale and the temporal evolution of gradient norms. At its core, it leverages z-score-based anomaly detection to identify and mitigate large gradient spikes, preventing malignant loss spikes while not interfering with convergence otherwise. Our code is available at: https://github.com/bluorion-com/ZClip.

ZClip：面向大语言模型预训练的自适应峰值抑制技术

ZClip: Adaptive Spike Mitigation for LLM Pre-Training

摘要

Summary

Support

Support