ChatGen:從自由對話自動生成文本到圖像

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

November 26, 2024
作者: Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, Minnan Luo
cs.AI

摘要

儘管文本轉圖像(T2I)生成模型取得了顯著進展,使用者在實際情境中常常面臨試錯的挑戰。這個挑戰源於繁瑣步驟的複雜性和不確定性,例如製作適當提示、選擇合適模型和配置特定參數,使得使用者不得不依賴勞動密集型的嘗試以獲得所需的圖像。本文提出自動T2I生成,旨在自動化這些繁瑣步驟,讓使用者可以用自由對話的方式描述他們的需求。為了系統性地研究這個問題,我們首先介紹了ChatGenBench,一個為自動T2I設計的新型基準測試。它具有高質量的成對數據和多樣的自由式輸入,能夠全面評估自動T2I模型在所有步驟上的表現。此外,我們將自動T2I視為一個複雜的多步推理任務,提出了ChatGen-Evo,一種多階段進化策略,逐步賦予模型必要的自動化技能。通過對步驟準確性和圖像質量的廣泛評估,ChatGen-Evo明顯提升了各種基準的性能。我們的評估還揭示了推進自動T2I的寶貴見解。我們的所有數據、代碼和模型將在https://chengyou-jia.github.io/ChatGen-Home 上提供。
English
Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios. This challenge arises from the complexity and uncertainty of tedious steps such as crafting suitable prompts, selecting appropriate models, and configuring specific arguments, making users resort to labor-intensive attempts for desired images. This paper proposes Automatic T2I generation, which aims to automate these tedious steps, allowing users to simply describe their needs in a freestyle chatting way. To systematically study this problem, we first introduce ChatGenBench, a novel benchmark designed for Automatic T2I. It features high-quality paired data with diverse freestyle inputs, enabling comprehensive evaluation of automatic T2I models across all steps. Additionally, recognizing Automatic T2I as a complex multi-step reasoning task, we propose ChatGen-Evo, a multi-stage evolution strategy that progressively equips models with essential automation skills. Through extensive evaluation across step-wise accuracy and image quality, ChatGen-Evo significantly enhances performance over various baselines. Our evaluation also uncovers valuable insights for advancing automatic T2I. All our data, code, and models will be available in https://chengyou-jia.github.io/ChatGen-Home

Summary

AI-Generated Summary

PDF233November 29, 2024