ChatPaper.aiChatPaper

OmniEdit:通过专家监督构建图像编辑通用模型

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

November 11, 2024
作者: Cong Wei, Zheyang Xiong, Weiming Ren, Xinrun Du, Ge Zhang, Wenhu Chen
cs.AI

摘要

指导性图像编辑方法通过在自动合成或手动注释的图像编辑对上训练扩散模型,展现出显著的潜力。然而,这些方法离实际应用仍有一定距离。我们确定了导致这一差距的三个主要挑战。首先,由于偏倚的合成过程,现有模型的编辑能力有限。其次,这些方法是通过具有大量噪声和伪影的数据集进行训练的。这是由于应用了像CLIP-score这样的简单滤波方法。第三,所有这些数据集都限制在单一低分辨率和固定长宽比,限制了处理真实世界用例的多样性。在本文中,我们提出了\omniedit,这是一个全能编辑器,可以无缝处理七种不同的图像编辑任务,适用于任何长宽比。我们的贡献有四个方面:(1) \omniedit通过利用来自七种不同专家模型的监督来确保任务覆盖。 (2) 我们利用基于大型多模型(如GPT-4o)提供的分数的重要性抽样,而不是CLIP-score来提高数据质量。 (3) 我们提出了一种名为EditNet的新编辑架构,极大地提高了编辑成功率。 (4) 我们提供具有不同长宽比的图像,以确保我们的模型可以处理野外的任何图像。我们精心策划了一个测试集,其中包含具有不同长宽比的图像,配有多样的指导,以涵盖不同任务。自动评估和人工评估均表明,\omniedit可以显著优于所有现有模型。我们的代码、数据集和模型将在以下网址提供:https://tiger-ai-lab.github.io/OmniEdit/
English
Instruction-guided image editing methods have demonstrated significant potential by training diffusion models on automatically synthesized or manually annotated image editing pairs. However, these methods remain far from practical, real-life applications. We identify three primary challenges contributing to this gap. Firstly, existing models have limited editing skills due to the biased synthesis process. Secondly, these methods are trained with datasets with a high volume of noise and artifacts. This is due to the application of simple filtering methods like CLIP-score. Thirdly, all these datasets are restricted to a single low resolution and fixed aspect ratio, limiting the versatility to handle real-world use cases. In this paper, we present \omniedit, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. Our contribution is in four folds: (1) \omniedit is trained by utilizing the supervision from seven different specialist models to ensure task coverage. (2) we utilize importance sampling based on the scores provided by large multimodal models (like GPT-4o) instead of CLIP-score to improve the data quality. (3) we propose a new editing architecture called EditNet to greatly boost the editing success rate, (4) we provide images with different aspect ratios to ensure that our model can handle any image in the wild. We have curated a test set containing images of different aspect ratios, accompanied by diverse instructions to cover different tasks. Both automatic evaluation and human evaluations demonstrate that \omniedit can significantly outperform all the existing models. Our code, dataset and model will be available at https://tiger-ai-lab.github.io/OmniEdit/

Summary

AI-Generated Summary

PDF505November 12, 2024