ChatPaper.aiChatPaper

拼合之道:基于部件的概念生成与IP先验

Piece it Together: Part-Based Concepting with IP-Priors

March 13, 2025
作者: Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or
cs.AI

摘要

先进的生成模型在图像合成方面表现出色,但通常依赖于基于文本的条件输入。然而,视觉设计师的工作往往超越语言范畴,他们直接从现有的视觉元素中汲取灵感。在许多情况下,这些元素仅代表潜在概念的片段——如独特结构的翅膀或特定发型——为艺术家提供灵感,探索如何将它们创造性地融合成一个连贯的整体。认识到这一需求,我们引入了一种生成框架,该框架能够无缝整合用户提供的一组部分视觉组件,同时采样生成完整概念所需的缺失部分,从而产生合理且完整的设计。我们的方法建立在IP-Adapter+提取的强大且未被充分探索的表征空间之上,在此基础之上,我们训练了IP-Prior,这是一个轻量级的流匹配模型,它基于领域特定的先验知识合成连贯的构图,支持多样化和上下文感知的生成。此外,我们提出了一种基于LoRA的微调策略,显著提升了IP-Adapter+在特定任务中的提示遵循能力,解决了其在重建质量与提示遵循之间常见的权衡问题。
English
Advanced generative models excel at synthesizing images but often rely on text-based conditioning. Visual designers, however, often work beyond language, directly drawing inspiration from existing visual elements. In many cases, these elements represent only fragments of a potential concept-such as an uniquely structured wing, or a specific hairstyle-serving as inspiration for the artist to explore how they can come together creatively into a coherent whole. Recognizing this need, we introduce a generative framework that seamlessly integrates a partial set of user-provided visual components into a coherent composition while simultaneously sampling the missing parts needed to generate a plausible and complete concept. Our approach builds on a strong and underexplored representation space, extracted from IP-Adapter+, on which we train IP-Prior, a lightweight flow-matching model that synthesizes coherent compositions based on domain-specific priors, enabling diverse and context-aware generations. Additionally, we present a LoRA-based fine-tuning strategy that significantly improves prompt adherence in IP-Adapter+ for a given task, addressing its common trade-off between reconstruction quality and prompt adherence.

Summary

AI-Generated Summary

PDF31March 14, 2025