圖像流形上的路徑：通過視頻生成進行圖像編輯

摘要

最近透過影像擴散模型推動的影像編輯技術取得了顯著進展。然而，仍存在重大挑戰，因為這些模型通常難以準確遵循複雜的編輯指示，並且經常通過改變原始影像的關鍵元素而降低忠實度。與此同時，視頻生成取得了顯著進展，擁有能夠有效充當一致且連續世界模擬器的模型。在本文中，我們提出通過利用影像轉視頻模型進行影像編輯，將這兩個領域進行融合。我們重新定義影像編輯為一個時間過程，利用預訓練的視頻模型從原始影像平滑過渡到所需的編輯。這種方法持續地穿越影像流形，確保一致的編輯同時保留原始影像的關鍵方面。我們的方法在基於文本的影像編輯方面取得了最新成果，展示了在編輯準確性和影像保留方面的重大改進。

English

Recent advances in image editing, driven by image diffusion models, have shown remarkable progress. However, significant challenges remain, as these models often struggle to follow complex edit instructions accurately and frequently compromise fidelity by altering key elements of the original image. Simultaneously, video generation has made remarkable strides, with models that effectively function as consistent and continuous world simulators. In this paper, we propose merging these two fields by utilizing image-to-video models for image editing. We reformulate image editing as a temporal process, using pretrained video models to create smooth transitions from the original image to the desired edit. This approach traverses the image manifold continuously, ensuring consistent edits while preserving the original image's key aspects. Our approach achieves state-of-the-art results on text-based image editing, demonstrating significant improvements in both edit accuracy and image preservation.

圖像流形上的路徑：通過視頻生成進行圖像編輯

Pathways on the Image Manifold: Image Editing via Video Generation

摘要

Summary

Support