ChatPaper.aiChatPaper

通过嵌入表示预热实现高效生成模型训练

Efficient Generative Model Training via Embedded Representation Warmup

April 14, 2025
作者: Deyuan Liu, Peng Sun, Xufeng Li, Tao Lin
cs.AI

摘要

扩散模型在生成高维数据方面表现出色,但在训练效率和表征质量上却不及自监督方法。我们发现了一个关键瓶颈:训练过程中未能充分利用高质量、富含语义的表征,这显著减缓了收敛速度。通过系统性分析,我们揭示了一个关键的表征处理区域——主要位于网络早期层——在这里,语义和结构模式的学习先于生成过程发生。为解决这一问题,我们提出了嵌入式表征预热(ERW),一种即插即用的框架。在第一阶段,ERW模块作为预热器,用高质量预训练表征初始化扩散模型的早期层。这种预热机制减轻了从零开始学习表征的负担,从而加速了收敛并提升了性能。我们的理论分析表明,ERW的有效性依赖于其精确整合到特定的神经网络层——即表征处理区域——模型在此主要处理和转换特征表征以供后续生成。我们进一步证实,ERW不仅加快了训练收敛速度,还提升了表征质量:实验表明,与当前最先进的REPA方法相比,我们的方法实现了40倍的训练速度提升。代码已发布于https://github.com/LINs-lab/ERW。
English
Diffusion models excel at generating high-dimensional data but fall short in training efficiency and representation quality compared to self-supervised methods. We identify a key bottleneck: the underutilization of high-quality, semantically rich representations during training notably slows down convergence. Our systematic analysis reveals a critical representation processing region -- primarily in the early layers -- where semantic and structural pattern learning takes place before generation can occur. To address this, we propose Embedded Representation Warmup (ERW), a plug-and-play framework where in the first stage we get the ERW module serves as a warmup that initializes the early layers of the diffusion model with high-quality, pretrained representations. This warmup minimizes the burden of learning representations from scratch, thereby accelerating convergence and boosting performance. Our theoretical analysis demonstrates that ERW's efficacy depends on its precise integration into specific neural network layers -- termed the representation processing region -- where the model primarily processes and transforms feature representations for later generation. We further establish that ERW not only accelerates training convergence but also enhances representation quality: empirically, our method achieves a 40times acceleration in training speed compared to REPA, the current state-of-the-art methods. Code is available at https://github.com/LINs-lab/ERW.

Summary

AI-Generated Summary

PDF122April 16, 2025