ChatPaper.aiChatPaper

基于自回归模型的个性化文本到图像生成

Personalized Text-to-Image Generation with Auto-Regressive Models

April 17, 2025
作者: Kaiyue Sun, Xian Liu, Yao Teng, Xihui Liu
cs.AI

摘要

个性化图像合成已成为文本到图像生成领域的关键应用,它能够在多样化的场景中创建包含特定主体的图像。尽管扩散模型在这一领域占据主导地位,但自回归模型凭借其统一处理文本和图像的架构,在个性化图像生成方面的潜力尚未得到充分探索。本文研究了优化自回归模型用于个性化图像合成的可能性,利用其固有的多模态能力来执行这一任务。我们提出了一种两阶段训练策略,结合了文本嵌入优化和Transformer层微调。在自回归模型上的实验表明,该方法在主体保真度和提示跟随方面与领先的基于扩散的个性化方法相当。这些结果凸显了自回归模型在个性化图像生成中的有效性,为该领域的未来研究提供了新的方向。
English
Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

Summary

AI-Generated Summary

PDF163April 23, 2025