ChatPaper.aiChatPaper

VisualCloze:一种基于视觉上下文学习的通用图像生成框架

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

April 10, 2025
作者: Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, Ming-Ming Cheng
cs.AI

摘要

近期,扩散模型在各类图像生成任务中取得了显著进展。然而,当前的主流方法仍集中于构建任务专用模型,这在满足多样化需求时效率有限。尽管通用模型试图解决这一局限,但它们面临着可推广的任务指令、恰当的任务分布以及统一架构设计等关键挑战。为应对这些挑战,我们提出了VisualCloze,一个通用的图像生成框架,它支持广泛的领域内任务、对未见任务的泛化、多任务的统一处理以及逆向生成。与现有依赖语言任务指令导致任务模糊和泛化能力弱的方法不同,我们融入了视觉上下文学习,使模型能够通过视觉演示识别任务。同时,视觉任务分布固有的稀疏性阻碍了跨任务可迁移知识的学习。为此,我们引入了Graph200K,一个图结构数据集,它建立了多种相互关联的任务,提升了任务密度和可迁移知识。此外,我们发现我们的统一图像生成公式与图像修复共享一致的目标,这使得我们能够在不修改架构的情况下,利用预训练修复模型的强大生成先验。
English
Recent progress in diffusion models significantly advances various image generation tasks. However, the current mainstream approach remains focused on building task-specific models, which have limited efficiency when supporting a wide range of different needs. While universal models attempt to address this limitation, they face critical challenges, including generalizable task instruction, appropriate task distributions, and unified architectural design. To tackle these challenges, we propose VisualCloze, a universal image generation framework, which supports a wide range of in-domain tasks, generalization to unseen ones, unseen unification of multiple tasks, and reverse generation. Unlike existing methods that rely on language-based task instruction, leading to task ambiguity and weak generalization, we integrate visual in-context learning, allowing models to identify tasks from visual demonstrations. Meanwhile, the inherent sparsity of visual task distributions hampers the learning of transferable knowledge across tasks. To this end, we introduce Graph200K, a graph-structured dataset that establishes various interrelated tasks, enhancing task density and transferable knowledge. Furthermore, we uncover that our unified image generation formulation shared a consistent objective with image infilling, enabling us to leverage the strong generative priors of pre-trained infilling models without modifying the architectures.

Summary

AI-Generated Summary

PDF463April 11, 2025