謹慎的優化器：透過一行程式碼改善訓練

摘要

AdamW一直是變壓器預訓練的默認優化器。多年來，我們的社區一直在尋找更快速和更穩定的優化器，並且只有積極的結果受到限制。在這項工作中，我們提出了一個對於任何基於動量的優化器的單行修改，我們將其重新命名為謹慎優化器，例如C-AdamW和C-Lion。我們的理論結果顯示，這種修改保留了Adam的哈密頓函數，並且在李雅普諾夫分析下不會破壞收斂保證。此外，我們的理論洞察力揭示了一整個新的優化器家族。在其中，我們選擇了最簡單的一種進行實驗，展示在Llama和MAE預訓練中加速高達1.47倍。代碼可在https://github.com/kyleliang919/C-Optim找到。

English

AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to 1.47times. Code is available at https://github.com/kyleliang919/C-Optim

謹慎的優化器：透過一行程式碼改善訓練

Cautious Optimizers: Improving Training with One Line of Code

摘要

Summary

Support