基於自回歸模型的個性化文本到圖像生成

摘要

個性化圖像合成已成為文本到圖像生成中的關鍵應用，使得在多元情境下創建特定主題的圖像成為可能。儘管擴散模型在這一領域佔據主導地位，但自回歸模型因其統一的文本與圖像建模架構，在個性化圖像生成方面仍未被充分探索。本文探討了優化自回歸模型以實現個性化圖像合成的潛力，利用其固有的多模態能力來執行此任務。我們提出了一種兩階段訓練策略，結合了文本嵌入的優化與變壓器層的微調。我們在自回歸模型上的實驗表明，該方法在主題忠實度和提示跟隨方面與領先的基於擴散的個性化方法相當。這些結果凸顯了自回歸模型在個性化圖像生成中的有效性，為該領域的未來研究提供了新的方向。

English

Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

基於自回歸模型的個性化文本到圖像生成

Personalized Text-to-Image Generation with Auto-Regressive Models

摘要

Summary

Support

Support