PixelFlow：基于像素空间的流生成模型

摘要

我们推出PixelFlow，这是一系列直接在原始像素空间操作的图像生成模型，与主流的潜在空间模型形成鲜明对比。该方法通过省去预训练变分自编码器（VAE）的需求，使整个模型可端到端训练，从而简化了图像生成流程。通过高效的级联流建模，PixelFlow在像素空间中实现了可承受的计算成本。在256×256 ImageNet类别条件图像生成基准测试中，其FID值达到1.98。定性文本到图像生成结果表明，PixelFlow在图像质量、艺术性及语义控制方面表现卓越。我们期待这一新范式能为下一代视觉生成模型带来启发，开辟新机遇。代码与模型已发布于https://github.com/ShoufaChen/PixelFlow。

English

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256times256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

PixelFlow：基于像素空间的流生成模型

PixelFlow: Pixel-Space Generative Models with Flow

摘要

Summary

Support

Support