ChatPaper.aiChatPaper

SliderSpace:分解扩散模型的视觉能力

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

February 3, 2025
作者: Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin
cs.AI

摘要

我们提出了SliderSpace,这是一个框架,可以自动将扩散模型的视觉能力分解为可控且人类可理解的方向。与现有的控制方法不同,这些方法需要用户为每个编辑方向单独指定属性,SliderSpace可以从单个文本提示中同时发现多个可解释且多样化的方向。每个方向都被训练为低秩适配器,实现了组合控制,并发现了模型潜在空间中的令人惊讶的可能性。通过对最先进的扩散模型进行大量实验,我们展示了SliderSpace在概念分解、艺术风格探索和多样性增强等三个应用中的有效性。我们的定量评估表明,SliderSpace发现的方向有效地分解了模型知识的视觉结构,为扩散模型中编码的潜在能力提供了见解。用户研究进一步验证,与基线相比,我们的方法生成了更多样化和有用的变化。我们的代码、数据和训练权重可在https://sliderspace.baulab.info获取。
English
We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info

Summary

AI-Generated Summary

PDF258February 4, 2025