清晰:类卷积线性化改进预训练扩散变换器 向上
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
December 20, 2024
作者: Songhua Liu, Zhenxiong Tan, Xinchao Wang
cs.AI
摘要
扩散变压器(DiT)已成为图像生成中领先的架构。然而,注意力机制的二次复杂度,负责建模记号间关系,导致生成高分辨率图像时出现显著的延迟。为解决这一问题,本文旨在提出一种线性注意力机制,将预训练的DiT的复杂度降至线性。我们从现有高效注意力机制全面总结开始探索,并确定了实现预训练DiT线性化成功的四个关键因素:局部性、公式一致性、高秩注意力图和特征完整性。基于这些见解,我们引入了一种类似卷积的局部注意力策略,称为CLEAR,它将特征交互限制在每个查询记号周围的局部窗口,从而实现线性复杂度。我们的实验表明,通过仅在10K个自动生成的样本上对注意力层进行10K次迭代的微调,我们可以有效地将知识从预训练的DiT转移到具有线性复杂度的学生模型,产生与教师模型相媲美的结果。同时,它将注意力计算减少了99.5%,并加速了生成8K分辨率图像的速度6.3倍。此外,我们研究了精馏注意力层中的有利特性,如跨各种模型和插件的零次泛化以及改进了对多GPU并行推理的支持。模型和代码可在此处获得:https://github.com/Huage001/CLEAR。
English
Diffusion Transformers (DiT) have become a leading architecture in image
generation. However, the quadratic complexity of attention mechanisms, which
are responsible for modeling token-wise relationships, results in significant
latency when generating high-resolution images. To address this issue, we aim
at a linear attention mechanism in this paper that reduces the complexity of
pre-trained DiTs to linear. We begin our exploration with a comprehensive
summary of existing efficient attention mechanisms and identify four key
factors crucial for successful linearization of pre-trained DiTs: locality,
formulation consistency, high-rank attention maps, and feature integrity. Based
on these insights, we introduce a convolution-like local attention strategy
termed CLEAR, which limits feature interactions to a local window around each
query token, and thus achieves linear complexity. Our experiments indicate
that, by fine-tuning the attention layer on merely 10K self-generated samples
for 10K iterations, we can effectively transfer knowledge from a pre-trained
DiT to a student model with linear complexity, yielding results comparable to
the teacher model. Simultaneously, it reduces attention computations by 99.5%
and accelerates generation by 6.3 times for generating 8K-resolution images.
Furthermore, we investigate favorable properties in the distilled attention
layers, such as zero-shot generalization cross various models and plugins, and
improved support for multi-GPU parallel inference. Models and codes are
available here: https://github.com/Huage001/CLEAR.Summary
AI-Generated Summary
论文概述
核心贡献
- 提出CLEAR,一种卷积式局部注意力策略,将预训练扩散Transformer(DiT)的注意力机制线性化,显著提高高分辨率图像生成的效率。
- 通过知识蒸馏,仅需10K次迭代微调,即可将预训练DiT的知识转移到线性复杂度的学生模型中,生成结果与原始模型相当。
- 在生成8K分辨率图像时,减少99.5%的注意力计算,加速6.3倍。
研究背景
- 扩散Transformer(DiT)在图像生成领域表现优异,但其注意力机制的二次复杂度导致高分辨率图像生成时延迟显著。
- 现有高效注意力机制在预训练DiT上的应用效果不佳,无法有效线性化。
关键词
- 扩散Transformer(DiT)
- 线性注意力
- 卷积式局部注意力
- 知识蒸馏
- 高分辨率图像生成
背景
研究空白
- 现有高效注意力机制在预训练DiT上的应用效果不佳,无法有效线性化。
- 缺乏针对预训练DiT的线性化策略。
技术挑战
- 如何在保持生成质量的同时,将预训练DiT的注意力机制线性化。
- 如何在高分辨率图像生成中显著减少计算复杂度和延迟。
现有方法
- 线性注意力:通过省略softmax操作,实现线性复杂度,但在预训练DiT上效果不佳。
- 键值压缩:通过压缩键值对减少计算复杂度,但会导致细节失真。
- 键值采样:通过采样键值对减少计算复杂度,但需要局部token来生成视觉一致的结果。
方法论
技术架构
- CLEAR采用卷积式局部注意力策略,限制每个查询token仅与局部窗口内的token交互,从而实现线性复杂度。
- 通过知识蒸馏,将预训练DiT的知识转移到线性复杂度的学生模型中。
实现细节
- 使用FlexAttention在PyTorch中高效实现CLEAR,支持GPU优化。
- 在10K个自生成样本上进行10K次迭代微调,仅训练注意力层参数。
创新点
- 提出卷积式局部注意力策略,首次针对预训练DiT进行线性化。
- 通过知识蒸馏,实现高效的知识转移,生成结果与原始模型相当。
- 支持多GPU并行推理,显著提高高分辨率图像生成的效率。
结果
实验设置
- 主要实验在FLUX模型系列上进行,替换所有注意力层为CLEAR,并在10K个自生成样本上进行微调。
- 在COCO2014验证集上进行定量评估,使用FID、LPIPS、CLIP图像相似度等指标。
关键发现
- CLEAR在生成1024×1024分辨率图像时,与原始FLUX-1.dev模型相比,性能相当甚至更优。
- 在生成8K分辨率图像时,减少99.5%的注意力计算,加速6.3倍。
- CLEAR支持跨分辨率、跨模型/插件的零样本泛化,并改进多GPU并行推理能力。
局限性
- 在实际应用中,CLEAR的加速效果未完全达到理论预期,尤其在较低分辨率时可能比原始DiT更慢。
- 稀疏注意力的硬件优化比全注意力计算更具挑战性,未来需要开发专门优化的CUDA算子。
结论
- CLEAR通过卷积式局部注意力策略,成功将预训练DiT的注意力机制线性化,显著提高高分辨率图像生成的效率。
- 通过知识蒸馏,仅需少量微调即可实现高效的知识转移,生成结果与原始模型相当。
- CLEAR支持跨分辨率、跨模型/插件的零样本泛化,并改进多GPU并行推理能力,具有广泛的应用前景。
1比特LLM时代:所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1比特LLM时代:所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei•Feb 27, 2024•612142
DeepSeek-R1:通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-R1:通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang•Jan 22, 2025•3745
Qwen2.5 技术报告Qwen2.5 Technical Report
Qwen2.5 技术报告
Qwen2.5 Technical Report
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu•Dec 19, 2024•36411