ChatPaper.aiChatPaper

PixelFlow:基於像素空間的流生成模型

PixelFlow: Pixel-Space Generative Models with Flow

April 10, 2025
作者: Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo
cs.AI

摘要

我們提出了PixelFlow,這是一系列直接在原始像素空間運作的圖像生成模型,與主流的潛在空間模型形成對比。此方法通過省去預訓練變分自編碼器(VAE)的需求,並使整個模型可端到端訓練,從而簡化了圖像生成過程。通過高效的級聯流建模,PixelFlow在像素空間中實現了可負擔的計算成本。在256×256 ImageNet類別條件圖像生成基準測試中,它取得了1.98的FID分數。定性文本到圖像的結果顯示,PixelFlow在圖像質量、藝術性和語義控制方面表現卓越。我們希望這一新範式能激發並為下一代視覺生成模型開闢新的機遇。代碼和模型可在https://github.com/ShoufaChen/PixelFlow獲取。
English
We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256times256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Summary

AI-Generated Summary

PDF156April 14, 2025