FluxSpace:在矫正流变压器中的解缠语义编辑

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

December 12, 2024
作者: Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
cs.AI

摘要

矫正流模型已成为图像生成中的主流方法,展示出在高质量图像合成方面的令人印象深刻的能力。然而,尽管在视觉生成方面表现出效果,矫正流模型经常在图像的解耦编辑方面遇到困难。这种限制阻碍了进行精确的、属性特定修改的能力,而不影响图像的其他方面。在本文中,我们介绍了FluxSpace,这是一种领域无关的图像编辑方法,利用一个能够控制由矫正流变换器生成的图像的语义的表示空间。通过利用矫正流模型中变换器块学到的表示,我们提出了一组语义可解释的表示,使得从细粒度图像编辑到艺术创作等各种图像编辑任务成为可能。这项工作提供了一种可扩展且有效的图像编辑方法,以及其解耦能力。
English
Rectified flow models have emerged as a dominant approach in image generation, showcasing impressive capabilities in high-quality image synthesis. However, despite their effectiveness in visual generation, rectified flow models often struggle with disentangled editing of images. This limitation prevents the ability to perform precise, attribute-specific modifications without affecting unrelated aspects of the image. In this paper, we introduce FluxSpace, a domain-agnostic image editing method leveraging a representation space with the ability to control the semantics of images generated by rectified flow transformers, such as Flux. By leveraging the representations learned by the transformer blocks within the rectified flow models, we propose a set of semantically interpretable representations that enable a wide range of image editing tasks, from fine-grained image editing to artistic creation. This work offers a scalable and effective image editing approach, along with its disentanglement capabilities.

Summary

AI-Generated Summary

PDF92December 16, 2024