FluxSpace:在矯正流變壓縮器中的解耦語義編輯

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

December 12, 2024
作者: Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
cs.AI

摘要

矯正流模型已成為影像生成中的主要方法,展示了在高品質影像合成方面令人印象深刻的能力。然而,儘管在視覺生成方面效果顯著,矯正流模型常常在影像的解耦編輯方面遇到困難。這種限制阻礙了對影像進行精確、特定屬性修改的能力,而不影響影像的其他方面。本文介紹了FluxSpace,一種利用表示空間的跨領域影像編輯方法,該方法能控制由矯正流變壓器(如Flux)生成的影像的語義。通過利用矯正流模型中變壓器塊學習到的表示,我們提出了一組具有語義可解釋性的表示,使得從精細影像編輯到藝術創作等各種影像編輯任務成為可能。這項工作提供了一種可擴展且有效的影像編輯方法,以及其解耦能力。
English
Rectified flow models have emerged as a dominant approach in image generation, showcasing impressive capabilities in high-quality image synthesis. However, despite their effectiveness in visual generation, rectified flow models often struggle with disentangled editing of images. This limitation prevents the ability to perform precise, attribute-specific modifications without affecting unrelated aspects of the image. In this paper, we introduce FluxSpace, a domain-agnostic image editing method leveraging a representation space with the ability to control the semantics of images generated by rectified flow transformers, such as Flux. By leveraging the representations learned by the transformer blocks within the rectified flow models, we propose a set of semantically interpretable representations that enable a wide range of image editing tasks, from fine-grained image editing to artistic creation. This work offers a scalable and effective image editing approach, along with its disentanglement capabilities.

Summary

AI-Generated Summary

PDF92December 16, 2024