基於自回歸模型的個性化文本到圖像生成
Personalized Text-to-Image Generation with Auto-Regressive Models
April 17, 2025
作者: Kaiyue Sun, Xian Liu, Yao Teng, Xihui Liu
cs.AI
摘要
個性化圖像合成已成為文本到圖像生成中的關鍵應用,使得在多元情境下創建特定主題的圖像成為可能。儘管擴散模型在這一領域佔據主導地位,但自回歸模型因其統一的文本與圖像建模架構,在個性化圖像生成方面仍未被充分探索。本文探討了優化自回歸模型以實現個性化圖像合成的潛力,利用其固有的多模態能力來執行此任務。我們提出了一種兩階段訓練策略,結合了文本嵌入的優化與變壓器層的微調。我們在自回歸模型上的實驗表明,該方法在主題忠實度和提示跟隨方面與領先的基於擴散的個性化方法相當。這些結果凸顯了自回歸模型在個性化圖像生成中的有效性,為該領域的未來研究提供了新的方向。
English
Personalized image synthesis has emerged as a pivotal application in
text-to-image generation, enabling the creation of images featuring specific
subjects in diverse contexts. While diffusion models have dominated this
domain, auto-regressive models, with their unified architecture for text and
image modeling, remain underexplored for personalized image generation. This
paper investigates the potential of optimizing auto-regressive models for
personalized image synthesis, leveraging their inherent multimodal capabilities
to perform this task. We propose a two-stage training strategy that combines
optimization of text embeddings and fine-tuning of transformer layers. Our
experiments on the auto-regressive model demonstrate that this method achieves
comparable subject fidelity and prompt following to the leading diffusion-based
personalization methods. The results highlight the effectiveness of
auto-regressive models in personalized image generation, offering a new
direction for future research in this area.Summary
AI-Generated Summary