PixelFlow:基于像素空间的流生成模型
PixelFlow: Pixel-Space Generative Models with Flow
April 10, 2025
作者: Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo
cs.AI
摘要
我们推出PixelFlow,这是一系列直接在原始像素空间操作的图像生成模型,与主流的潜在空间模型形成鲜明对比。该方法通过省去预训练变分自编码器(VAE)的需求,使整个模型可端到端训练,从而简化了图像生成流程。通过高效的级联流建模,PixelFlow在像素空间中实现了可承受的计算成本。在256×256 ImageNet类别条件图像生成基准测试中,其FID值达到1.98。定性文本到图像生成结果表明,PixelFlow在图像质量、艺术性及语义控制方面表现卓越。我们期待这一新范式能为下一代视觉生成模型带来启发,开辟新机遇。代码与模型已发布于https://github.com/ShoufaChen/PixelFlow。
English
We present PixelFlow, a family of image generation models that operate
directly in the raw pixel space, in contrast to the predominant latent-space
models. This approach simplifies the image generation process by eliminating
the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole
model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow
achieves affordable computation cost in pixel space. It achieves an FID of 1.98
on 256times256 ImageNet class-conditional image generation benchmark. The
qualitative text-to-image results demonstrate that PixelFlow excels in image
quality, artistry, and semantic control. We hope this new paradigm will inspire
and open up new opportunities for next-generation visual generation models.
Code and models are available at https://github.com/ShoufaChen/PixelFlow.Summary
AI-Generated Summary