ChatPaper.aiChatPaper

ComfyGen:針對文本到圖像生成的提示自適應工作流程

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

October 2, 2024
作者: Rinon Gal, Adi Haviv, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Gal Chechik
cs.AI

摘要

從簡單的單一模型到結合多個專業組件的複雜工作流程,文字轉圖像生成的實際應用已經有所演變。儘管基於工作流程的方法可以提高圖像質量,但要打造有效的工作流程需要相當的專業知識,因為有大量可用的組件、它們之間複雜的相互依賴性,以及它們對生成提示的依賴。在這裡,我們介紹了一個新的任務,即提示自適應工作流程生成,其目標是自動為每個用戶提示定制工作流程。我們提出了兩種基於LLM的方法來應對這個任務:一種是基於調整的方法,從用戶偏好數據中學習,另一種是無需訓練的方法,利用LLM來選擇現有的流程。與單一模型或通用的、與提示無關的工作流程相比,這兩種方法都能提高圖像質量。我們的工作表明,基於提示的流程預測為改善文字轉圖像生成質量開辟了一條新途徑,與該領域中現有的研究方向相輔相成。
English
The practical use of text-to-image generation has evolved from simple, monolithic models to complex workflows that combine multiple specialized components. While workflow-based approaches can lead to improved image quality, crafting effective workflows requires significant expertise, owing to the large number of available components, their complex inter-dependence, and their dependence on the generation prompt. Here, we introduce the novel task of prompt-adaptive workflow generation, where the goal is to automatically tailor a workflow to each user prompt. We propose two LLM-based approaches to tackle this task: a tuning-based method that learns from user-preference data, and a training-free method that uses the LLM to select existing flows. Both approaches lead to improved image quality when compared to monolithic models or generic, prompt-independent workflows. Our work shows that prompt-dependent flow prediction offers a new pathway to improving text-to-image generation quality, complementing existing research directions in the field.

Summary

AI-Generated Summary

PDF172November 16, 2024