Skrr：用于高效存储的跳过和重用文本编码器层，用于文本到图像生成

摘要

在文本到图像（T2I）扩散模型中，大规模文本编码器展现出卓越的性能，能够从文本提示生成高质量图像。与依赖多次迭代步骤的去噪模块不同，文本编码器仅需进行一次前向传递即可生成文本嵌入。然而，尽管文本编码器对总推理时间和浮点运算（FLOPs）的贡献较小，但其内存使用要求显著更高，高达去噪模块的八倍。为解决这种低效率，我们提出了Skip and Re-use layers（Skrr），这是一种专门为T2I扩散模型中的文本编码器设计的简单而有效的修剪策略。Skrr通过有针对性地跳过或重复利用变压器块中的某些层，以降低内存消耗而不影响性能，从而利用变压器块中的固有冗余。大量实验证明，Skrr在高稀疏水平下保持了与原始模型相媲美的图像质量，优于现有的基于块的修剪方法。此外，Skrr在保持各项评估指标（包括FID、CLIP、DreamSim和GenEval分数）的性能的同时，实现了最先进的内存效率。

English

Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance in generating high-quality images from textual prompts. Unlike denoising modules that rely on multiple iterative steps, text encoders require only a single forward pass to produce text embeddings. However, despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage, up to eight times more than denoising modules. To address this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models. Skrr exploits the inherent redundancy in transformer blocks by selectively skipping or reusing certain layers in a manner tailored for T2I tasks, thereby reducing memory consumption without compromising performance. Extensive experiments demonstrate that Skrr maintains image quality comparable to the original model even under high sparsity levels, outperforming existing blockwise pruning methods. Furthermore, Skrr achieves state-of-the-art memory efficiency while preserving performance across multiple evaluation metrics, including the FID, CLIP, DreamSim, and GenEval scores.

Skrr：用于高效存储的跳过和重用文本编码器层，用于文本到图像生成

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

摘要

Summary

Support