슬라이더 스페이스: 확산 모델의 시각 능력 분해

초록

우리는 SliderSpace를 제시합니다. 이는 확산 모델의 시각적 능력을 제어 가능하고 인간이 이해할 수 있는 방향으로 자동으로 분해하는 프레임워크입니다. 각 방향은 저차원 어댑터로 훈련되어 구성 제어와 모델의 잠재 공간에서 놀라운 가능성의 발견을 가능하게 합니다. 최첨단 확산 모델에 대한 광범위한 실험을 통해, 개념 분해, 예술적 스타일 탐색 및 다양성 향상 세 가지 응용 분야에서 SliderSpace의 효과를 입증합니다. 우리의 정량적 평가는 SliderSpace에서 발견된 방향이 모델의 지식의 시각적 구조를 효과적으로 분해하며, 확산 모델 내에 인코딩된 잠재 능력에 대한 통찰을 제공한다는 것을 보여줍니다. 사용자 연구는 우리의 방법이 기준선과 비교하여 더 다양하고 유용한 변형을 생성한다는 것을 추가로 검증합니다. 우리의 코드, 데이터 및 훈련된 가중치는 https://sliderspace.baulab.info에서 사용할 수 있습니다.

English

We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info

슬라이더 스페이스: 확산 모델의 시각 능력 분해

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

초록

Summary

Support