谨慎的优化器：用一行代码改善训练

摘要

AdamW一直是变压器预训练的默认优化器。多年来，我们的社区一直在寻找更快速、更稳定的优化器，只带来积极的结果。在这项工作中，我们提出了对PyTorch中任何基于动量的优化器进行一行修改的方法，我们将其命名为谨慎优化器，例如C-AdamW和C-Lion。我们的理论结果表明，这种修改保留了Adam的哈密顿函数，并且在李亚普诺夫分析下不会破坏收敛保证。此外，我们的理论洞见揭示了一个全新的优化器家族。在其中，我们选择了最简单的一个进行实证实验，展示了在Llama和MAE预训练中高达1.47倍的加速。代码可在https://github.com/kyleliang919/C-Optim找到。

English

AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to 1.47times. Code is available at https://github.com/kyleliang919/C-Optim

谨慎的优化器：用一行代码改善训练

Cautious Optimizers: Improving Training with One Line of Code

摘要

Summary

Support

Support