BrushEdit：一體化圖像修補和編輯

摘要

隨著擴散模型的發展，影像編輯在逆向和指示兩種方法中有了顯著進步。然而，目前的逆向方法在進行大幅度修改（例如添加或移除物件）時遇到困難，這是由於逆向噪音的結構化特性，阻礙了實質性的變更。與此同時，指示式方法通常限制用戶進行黑盒操作，限制了直接互動以指定編輯區域和強度。為了解決這些限制，我們提出了BrushEdit，一種新型的基於修補的指示引導影像編輯範式，利用多模態大語言模型（MLLMs）和影像修補模型實現自主、用戶友好和互動式的自由形式指示編輯。具體而言，我們設計了一個系統，通過在代理協作框架中集成MLLMs和雙分支影像修補模型，實現自由形式指示編輯，以執行編輯類別分類、主要物件識別、遮罩獲取和編輯區域修補。大量實驗表明，我們的框架有效地結合了MLLMs和修補模型，在包括遮罩區域保留和編輯效果一致性在內的七個指標上實現了卓越的性能。

English

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects) due to the structured nature of inversion noise, which hinders substantial changes. Meanwhile, instruction-based methods often constrain users to black-box operations, limiting direct interaction for specifying editing regions and intensity. To address these limitations, we propose BrushEdit, a novel inpainting-based instruction-guided image editing paradigm, which leverages multimodal large language models (MLLMs) and image inpainting models to enable autonomous, user-friendly, and interactive free-form instruction editing. Specifically, we devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model in an agent-cooperative framework to perform editing category classification, main object identification, mask acquisition, and editing area inpainting. Extensive experiments show that our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics including mask region preservation and editing effect coherence.

BrushEdit：一體化圖像修補和編輯

BrushEdit: All-In-One Image Inpainting and Editing

摘要

Summary

Support