ChatPaper.aiChatPaper

PixelFlow:基于像素空间的流生成模型

PixelFlow: Pixel-Space Generative Models with Flow

April 10, 2025
作者: Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo
cs.AI

摘要

我们推出PixelFlow,这是一系列直接在原始像素空间操作的图像生成模型,与主流的潜在空间模型形成鲜明对比。该方法通过省去预训练变分自编码器(VAE)的需求,使整个模型可端到端训练,从而简化了图像生成流程。通过高效的级联流建模,PixelFlow在像素空间中实现了可承受的计算成本。在256×256 ImageNet类别条件图像生成基准测试中,其FID值达到1.98。定性文本到图像生成结果表明,PixelFlow在图像质量、艺术性及语义控制方面表现卓越。我们期待这一新范式能为下一代视觉生成模型带来启发,开辟新机遇。代码与模型已发布于https://github.com/ShoufaChen/PixelFlow。
English
We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256times256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Summary

AI-Generated Summary

PDF196April 14, 2025