ChatPaper.aiChatPaper

高效且有效的掩码图像生成模型

Effective and Efficient Masked Image Generation Models

March 10, 2025
作者: Zebin You, Jingyang Ou, Xiaolu Zhang, Jun Hu, Jun Zhou, Chongxuan Li
cs.AI

摘要

尽管掩码图像生成模型与掩码扩散模型在设计动机与目标上各有不同,但我们发现它们可被统一于同一框架之下。基于这一洞见,我们深入探索了训练与采样的设计空间,识别出对性能与效率均有贡献的关键因素。在此探索过程中,依据观察到的改进,我们开发了名为eMIGM的模型。实证表明,eMIGM在ImageNet生成任务上展现出强劲性能,以弗雷歇初始距离(FID)为衡量标准。特别是在ImageNet 256x256分辨率下,在函数评估次数(NFE)和模型参数数量相近的情况下,eMIGM超越了开创性的VAR模型。此外,随着NFE和模型参数的增加,eMIGM在仅需不到40%的NFE时,其性能便可与最先进的连续扩散模型相媲美。更进一步,在ImageNet 512x512分辨率下,仅需约60%的NFE,eMIGM便超越了当前最先进的连续扩散模型。
English
Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fr\'echet Inception Distance (FID). In particular, on ImageNet 256x256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion models while requiring less than 40% of the NFE. Additionally, on ImageNet 512x512, with only about 60% of the NFE, eMIGM outperforms the state-of-the-art continuous diffusion models.

Summary

AI-Generated Summary

PDF92March 11, 2025