BrushEdit: 올인원 이미지 인페인팅 및 편집

초록

이미지 편집은 확산 모델의 발전으로 크게 발전해 왔는데, 이는 역전 기반 및 명령 기반 방법을 사용한다. 그러나 현재의 역전 기반 접근법은 구조화된 역전 노이즈의 성질로 인해 큰 수정(예: 객체 추가 또는 제거)에 어려움을 겪고 있어 실질적인 변경을 방해한다. 한편, 명령 기반 방법은 종종 사용자를 블랙박스 작업에 제한하여 편집 영역 및 강도를 명시하는 데 있어 직접적인 상호작용을 제한한다. 이러한 한계를 극복하기 위해 우리는 BrushEdit을 제안한다. 이는 새로운 인페인팅 기반 명령 안내 이미지 편집 패러다임으로, 다중 모달 대형 언어 모델(MLLMs)과 이미지 인페인팅 모델을 활용하여 자율적이고 사용자 친화적이며 대화형의 자유 형식 명령 편집을 가능하게 한다. 구체적으로, 편집 범주 분류, 주요 객체 식별, 마스크 획득 및 편집 영역 인페인팅을 수행하기 위해 MLLMs와 이중 브랜치 이미지 인페인팅 모델을 통합한 시스템을 고안했다. 광범위한 실험 결과 우리의 프레임워크가 MLLMs와 인페인팅 모델을 효과적으로 결합하여 마스크 영역 보존 및 편집 효과 일관성을 포함한 일곱 가지 메트릭에서 우수한 성능을 달성함을 보여준다.

English

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects) due to the structured nature of inversion noise, which hinders substantial changes. Meanwhile, instruction-based methods often constrain users to black-box operations, limiting direct interaction for specifying editing regions and intensity. To address these limitations, we propose BrushEdit, a novel inpainting-based instruction-guided image editing paradigm, which leverages multimodal large language models (MLLMs) and image inpainting models to enable autonomous, user-friendly, and interactive free-form instruction editing. Specifically, we devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model in an agent-cooperative framework to perform editing category classification, main object identification, mask acquisition, and editing area inpainting. Extensive experiments show that our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics including mask region preservation and editing effect coherence.

BrushEdit: 올인원 이미지 인페인팅 및 편집

BrushEdit: All-In-One Image Inpainting and Editing

초록

Support