BrushEdit:一体化图像修复和编辑
BrushEdit: All-In-One Image Inpainting and Editing
December 13, 2024
作者: Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Qiang Xu
cs.AI
摘要
随着使用基于反演和基于指令的方法开发扩散模型,图像编辑已取得显著进展。然而,当前基于反演的方法在处理大修改(例如添加或移除对象)时存在困难,这是由于反演噪声的结构化特性,阻碍了实质性的更改。与此同时,基于指令的方法通常限制用户进行黑盒操作,限制了直接交互以指定编辑区域和强度。为了解决这些限制,我们提出了BrushEdit,这是一种新颖的基于修补的指令引导图像编辑范式,利用多模态大语言模型(MLLMs)和图像修补模型,实现了自主、用户友好和交互式的自由形式指令编辑。具体来说,我们设计了一个系统,通过在代理合作框架中集成MLLMs和双分支图像修补模型,实现了自由形式指令编辑,以执行编辑类别分类、主要对象识别、蒙版获取和编辑区域修补。大量实验证明,我们的框架有效地结合了MLLMs和修补模型,在包括蒙版区域保留和编辑效果连贯性在内的七个指标上实现了卓越性能。
English
Image editing has advanced significantly with the development of diffusion
models using both inversion-based and instruction-based methods. However,
current inversion-based approaches struggle with big modifications (e.g.,
adding or removing objects) due to the structured nature of inversion noise,
which hinders substantial changes. Meanwhile, instruction-based methods often
constrain users to black-box operations, limiting direct interaction for
specifying editing regions and intensity. To address these limitations, we
propose BrushEdit, a novel inpainting-based instruction-guided image editing
paradigm, which leverages multimodal large language models (MLLMs) and image
inpainting models to enable autonomous, user-friendly, and interactive
free-form instruction editing. Specifically, we devise a system enabling
free-form instruction editing by integrating MLLMs and a dual-branch image
inpainting model in an agent-cooperative framework to perform editing category
classification, main object identification, mask acquisition, and editing area
inpainting. Extensive experiments show that our framework effectively combines
MLLMs and inpainting models, achieving superior performance across seven
metrics including mask region preservation and editing effect coherence.Summary
AI-Generated Summary