Skrr:用于高效存储的跳过和重用文本编码器层,用于文本到图像生成
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
February 12, 2025
作者: Hoigi Seo, Wongi Jeong, Jae-sun Seo, Se Young Chun
cs.AI
摘要
在文本到图像(T2I)扩散模型中,大规模文本编码器展现出卓越的性能,能够从文本提示生成高质量图像。与依赖多次迭代步骤的去噪模块不同,文本编码器仅需进行一次前向传递即可生成文本嵌入。然而,尽管文本编码器对总推理时间和浮点运算(FLOPs)的贡献较小,但其内存使用要求显著更高,高达去噪模块的八倍。为解决这种低效率,我们提出了Skip and Re-use layers(Skrr),这是一种专门为T2I扩散模型中的文本编码器设计的简单而有效的修剪策略。Skrr通过有针对性地跳过或重复利用变压器块中的某些层,以降低内存消耗而不影响性能,从而利用变压器块中的固有冗余。大量实验证明,Skrr在高稀疏水平下保持了与原始模型相媲美的图像质量,优于现有的基于块的修剪方法。此外,Skrr在保持各项评估指标(包括FID、CLIP、DreamSim和GenEval分数)的性能的同时,实现了最先进的内存效率。
English
Large-scale text encoders in text-to-image (T2I) diffusion models have
demonstrated exceptional performance in generating high-quality images from
textual prompts. Unlike denoising modules that rely on multiple iterative
steps, text encoders require only a single forward pass to produce text
embeddings. However, despite their minimal contribution to total inference time
and floating-point operations (FLOPs), text encoders demand significantly
higher memory usage, up to eight times more than denoising modules. To address
this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet
effective pruning strategy specifically designed for text encoders in T2I
diffusion models. Skrr exploits the inherent redundancy in transformer blocks
by selectively skipping or reusing certain layers in a manner tailored for T2I
tasks, thereby reducing memory consumption without compromising performance.
Extensive experiments demonstrate that Skrr maintains image quality comparable
to the original model even under high sparsity levels, outperforming existing
blockwise pruning methods. Furthermore, Skrr achieves state-of-the-art memory
efficiency while preserving performance across multiple evaluation metrics,
including the FID, CLIP, DreamSim, and GenEval scores.Summary
AI-Generated Summary