BrushEdit:一體化圖像修補和編輯
BrushEdit: All-In-One Image Inpainting and Editing
December 13, 2024
作者: Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Qiang Xu
cs.AI
摘要
隨著擴散模型的發展,影像編輯在逆向和指示兩種方法中有了顯著進步。然而,目前的逆向方法在進行大幅度修改(例如添加或移除物件)時遇到困難,這是由於逆向噪音的結構化特性,阻礙了實質性的變更。與此同時,指示式方法通常限制用戶進行黑盒操作,限制了直接互動以指定編輯區域和強度。為了解決這些限制,我們提出了BrushEdit,一種新型的基於修補的指示引導影像編輯範式,利用多模態大語言模型(MLLMs)和影像修補模型實現自主、用戶友好和互動式的自由形式指示編輯。具體而言,我們設計了一個系統,通過在代理協作框架中集成MLLMs和雙分支影像修補模型,實現自由形式指示編輯,以執行編輯類別分類、主要物件識別、遮罩獲取和編輯區域修補。大量實驗表明,我們的框架有效地結合了MLLMs和修補模型,在包括遮罩區域保留和編輯效果一致性在內的七個指標上實現了卓越的性能。
English
Image editing has advanced significantly with the development of diffusion
models using both inversion-based and instruction-based methods. However,
current inversion-based approaches struggle with big modifications (e.g.,
adding or removing objects) due to the structured nature of inversion noise,
which hinders substantial changes. Meanwhile, instruction-based methods often
constrain users to black-box operations, limiting direct interaction for
specifying editing regions and intensity. To address these limitations, we
propose BrushEdit, a novel inpainting-based instruction-guided image editing
paradigm, which leverages multimodal large language models (MLLMs) and image
inpainting models to enable autonomous, user-friendly, and interactive
free-form instruction editing. Specifically, we devise a system enabling
free-form instruction editing by integrating MLLMs and a dual-branch image
inpainting model in an agent-cooperative framework to perform editing category
classification, main object identification, mask acquisition, and editing area
inpainting. Extensive experiments show that our framework effectively combines
MLLMs and inpainting models, achieving superior performance across seven
metrics including mask region preservation and editing effect coherence.Summary
AI-Generated Summary