신중한 최적화자: 한 줄의 코드로 훈련 개선하기

초록

AdamW는 트랜스포머 사전 훈련의 기본 옵티마이저였습니다. 많은 해동안, 우리 커뮤니티는 더 빠르고 안정적인 옵티마이저를 찾아왔으며 이는 긍정적인 결과에만 제약을 두었습니다. 본 연구에서는 PyTorch에 한 줄의 수정을 제안하여 모멘텀 기반 옵티마이저에 Cautious Optimizer라는 이름을 붙였습니다. 예를 들어 C-AdamW와 C-Lion입니다. 우리의 이론적 결과는 이 수정이 Adam의 Hamiltonian 함수를 보존하며 Lyapunov 분석에서 수렴 보장을 깨지 않음을 보여줍니다. 게다가, 우리의 이론적 통찰력에 의해 새로운 옵티마이저 패밀리 전체가 밝혀졌습니다. 이 중에서 우리는 실험적 연구를 위해 가장 간단한 것을 선택하여 Llama 및 MAE 사전 훈련에서 최대 1.47배의 가속을 보여주었습니다. 코드는 https://github.com/kyleliang919/C-Optim에서 확인할 수 있습니다.

English

AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to 1.47times. Code is available at https://github.com/kyleliang919/C-Optim

신중한 최적화자: 한 줄의 코드로 훈련 개선하기

Cautious Optimizers: Improving Training with One Line of Code

초록

Support