OmniEdit:通过专家监督构建图像编辑通用模型
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
November 11, 2024
作者: Cong Wei, Zheyang Xiong, Weiming Ren, Xinrun Du, Ge Zhang, Wenhu Chen
cs.AI
摘要
指导性图像编辑方法通过在自动合成或手动注释的图像编辑对上训练扩散模型,展现出显著的潜力。然而,这些方法离实际应用仍有一定距离。我们确定了导致这一差距的三个主要挑战。首先,由于偏倚的合成过程,现有模型的编辑能力有限。其次,这些方法是通过具有大量噪声和伪影的数据集进行训练的。这是由于应用了像CLIP-score这样的简单滤波方法。第三,所有这些数据集都限制在单一低分辨率和固定长宽比,限制了处理真实世界用例的多样性。在本文中,我们提出了\omniedit,这是一个全能编辑器,可以无缝处理七种不同的图像编辑任务,适用于任何长宽比。我们的贡献有四个方面:(1) \omniedit通过利用来自七种不同专家模型的监督来确保任务覆盖。 (2) 我们利用基于大型多模型(如GPT-4o)提供的分数的重要性抽样,而不是CLIP-score来提高数据质量。 (3) 我们提出了一种名为EditNet的新编辑架构,极大地提高了编辑成功率。 (4) 我们提供具有不同长宽比的图像,以确保我们的模型可以处理野外的任何图像。我们精心策划了一个测试集,其中包含具有不同长宽比的图像,配有多样的指导,以涵盖不同任务。自动评估和人工评估均表明,\omniedit可以显著优于所有现有模型。我们的代码、数据集和模型将在以下网址提供:https://tiger-ai-lab.github.io/OmniEdit/
English
Instruction-guided image editing methods have demonstrated significant
potential by training diffusion models on automatically synthesized or manually
annotated image editing pairs. However, these methods remain far from
practical, real-life applications. We identify three primary challenges
contributing to this gap. Firstly, existing models have limited editing skills
due to the biased synthesis process. Secondly, these methods are trained with
datasets with a high volume of noise and artifacts. This is due to the
application of simple filtering methods like CLIP-score. Thirdly, all these
datasets are restricted to a single low resolution and fixed aspect ratio,
limiting the versatility to handle real-world use cases. In this paper, we
present \omniedit, which is an omnipotent editor to handle seven different
image editing tasks with any aspect ratio seamlessly. Our contribution is in
four folds: (1) \omniedit is trained by utilizing the supervision from seven
different specialist models to ensure task coverage. (2) we utilize importance
sampling based on the scores provided by large multimodal models (like GPT-4o)
instead of CLIP-score to improve the data quality. (3) we propose a new editing
architecture called EditNet to greatly boost the editing success rate, (4) we
provide images with different aspect ratios to ensure that our model can handle
any image in the wild. We have curated a test set containing images of
different aspect ratios, accompanied by diverse instructions to cover different
tasks. Both automatic evaluation and human evaluations demonstrate that
\omniedit can significantly outperform all the existing models. Our code,
dataset and model will be available at
https://tiger-ai-lab.github.io/OmniEdit/Summary
AI-Generated Summary