通过嵌入表示预热实现高效生成模型训练
Efficient Generative Model Training via Embedded Representation Warmup
April 14, 2025
作者: Deyuan Liu, Peng Sun, Xufeng Li, Tao Lin
cs.AI
摘要
扩散模型在生成高维数据方面表现出色,但在训练效率和表征质量上却不及自监督方法。我们发现了一个关键瓶颈:训练过程中未能充分利用高质量、富含语义的表征,这显著减缓了收敛速度。通过系统性分析,我们揭示了一个关键的表征处理区域——主要位于网络早期层——在这里,语义和结构模式的学习先于生成过程发生。为解决这一问题,我们提出了嵌入式表征预热(ERW),一种即插即用的框架。在第一阶段,ERW模块作为预热器,用高质量预训练表征初始化扩散模型的早期层。这种预热机制减轻了从零开始学习表征的负担,从而加速了收敛并提升了性能。我们的理论分析表明,ERW的有效性依赖于其精确整合到特定的神经网络层——即表征处理区域——模型在此主要处理和转换特征表征以供后续生成。我们进一步证实,ERW不仅加快了训练收敛速度,还提升了表征质量:实验表明,与当前最先进的REPA方法相比,我们的方法实现了40倍的训练速度提升。代码已发布于https://github.com/LINs-lab/ERW。
English
Diffusion models excel at generating high-dimensional data but fall short in
training efficiency and representation quality compared to self-supervised
methods. We identify a key bottleneck: the underutilization of high-quality,
semantically rich representations during training notably slows down
convergence. Our systematic analysis reveals a critical representation
processing region -- primarily in the early layers -- where semantic and
structural pattern learning takes place before generation can occur. To address
this, we propose Embedded Representation Warmup (ERW), a plug-and-play
framework where in the first stage we get the ERW module serves as a warmup
that initializes the early layers of the diffusion model with high-quality,
pretrained representations. This warmup minimizes the burden of learning
representations from scratch, thereby accelerating convergence and boosting
performance. Our theoretical analysis demonstrates that ERW's efficacy depends
on its precise integration into specific neural network layers -- termed the
representation processing region -- where the model primarily processes and
transforms feature representations for later generation. We further establish
that ERW not only accelerates training convergence but also enhances
representation quality: empirically, our method achieves a 40times
acceleration in training speed compared to REPA, the current state-of-the-art
methods. Code is available at https://github.com/LINs-lab/ERW.Summary
AI-Generated Summary