InstantDrag: 드래그 기반 이미지 편집의 상호 작용성 향상

초록

최근 드래그 기반 이미지 편집은 상호 작용성과 정밀성으로 인해 인기를 얻고 있습니다. 그러나 텍스트에서 이미지를 생성하는 모델이 1초 이내에 샘플을 생성할 수 있는 능력에도 불구하고, 드래그 편집은 사용자 상호 작용을 정확하게 반영하면서 이미지 콘텐츠를 유지하는 것이 어려운 점으로 인해 여전히 뒤쳐지고 있습니다. 기존 접근 방식 중 일부는 이미지 당 최적화에 의존하거나 복잡한 가이드 기반 방법을 사용하여 이동 가능한 영역과 텍스트 프롬프트와 같은 추가 입력이 필요하며, 이는 편집 프로세스의 상호 작용성을 저해할 수 있습니다. 우리는 InstantDrag를 소개합니다. InstantDrag는 최적화 없이 상호 작용성과 속도를 향상시키는 파이프라인으로, 이미지와 드래그 명령만을 입력으로 필요로 합니다. InstantDrag는 두 가지 신중하게 설계된 네트워크로 구성되어 있습니다: 드래그 조건부 광학 흐름 생성기(FlowGen)와 광학 흐름 조건부 확산 모델(FlowDiffusion). InstantDrag는 실제 비디오 데이터셋에서 드래그 기반 이미지 편집을 위한 움직임 동역학을 학습함으로써 작업을 움직임 생성과 움직임 조건부 이미지 생성으로 분해합니다. 우리는 InstantDrag가 마스크나 텍스트 프롬프트 없이 얼굴 비디오 데이터셋 및 일반 장면에서의 실험을 통해 빠르고 사실적인 편집을 수행하는 능력을 증명합니다. 이러한 결과는 드래그 기반 이미지 편집을 처리하는 우리의 방법의 효율성을 강조하며, 상호 작용적이고 실시간 응용 프로그램에 대한 유망한 솔루션이 될 것으로 기대됩니다.

English

Drag-based image editing has recently gained popularity for its interactivity and precision. However, despite the ability of text-to-image models to generate samples within a second, drag editing still lags behind due to the challenge of accurately reflecting user interaction while maintaining image content. Some existing approaches rely on computationally intensive per-image optimization or intricate guidance-based methods, requiring additional inputs such as masks for movable regions and text prompts, thereby compromising the interactivity of the editing process. We introduce InstantDrag, an optimization-free pipeline that enhances interactivity and speed, requiring only an image and a drag instruction as input. InstantDrag consists of two carefully designed networks: a drag-conditioned optical flow generator (FlowGen) and an optical flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion dynamics for drag-based image editing in real-world video datasets by decomposing the task into motion generation and motion-conditioned image generation. We demonstrate InstantDrag's capability to perform fast, photo-realistic edits without masks or text prompts through experiments on facial video datasets and general scenes. These results highlight the efficiency of our approach in handling drag-based image editing, making it a promising solution for interactive, real-time applications.

InstantDrag: 드래그 기반 이미지 편집의 상호 작용성 향상

InstantDrag: Improving Interactivity in Drag-based Image Editing

초록

Summary

Support

Support