이토 밀도 추정기를 사용하여 확산 모델의 중첩

초록

캄브리아 폭발적으로 쉽게 접근할 수 있는 사전 훈련된 확산 모델들의 등장은 하나의 큰 통합 모델을 다시 훈련하는 중요한 계산 부담 없이 여러 다른 사전 훈련된 확산 모델을 결합하는 방법에 대한 수요를 시사합니다. 본 논문에서는 생성 단계에서 여러 사전 훈련된 확산 모델을 결합하는 문제를 새롭게 제안된 '초위상(superposition)' 프레임워크 아래에서 제시합니다. 이론적으로 우리는 초위상을 연속 방정식에서 비롯된 엄격한 첫 원리로부터 유도하고, SuperDiff에서 확산 모델을 결합하기 위해 특별히 설계된 두 가지 새로운 알고리즘을 개발합니다. SuperDiff는 확산 SDE의 로그 우도에 대한 새로운 확장 가능한 이토 밀도 추정기를 활용하며, 발산 계산에 필요한 잘 알려진 허친슨의 추정기와 비교했을 때 추가적인 오버헤드가 없습니다. 우리는 초위상이 추론 중에 단순히 합성을 통해 수행되므로 SuperDiff가 대규모 사전 훈련된 확산 모델에 확장 가능하며, 자동 가중치 조정 체계를 통해 다른 사전 훈련된 벡터 필드를 결합함으로써 구현이 간편하다는 것을 입증합니다. 특히, 우리는 SuperDiff가 추론 시 효율적이며 논리적 OR 및 논리적 AND와 같은 전통적인 합성 연산자를 모방한다는 것을 보여줍니다. 우리는 CIFAR-10에서 더 다양한 이미지를 생성하기 위해 SuperDiff를 사용하는 유용성을 경험적으로 입증하고, Stable Diffusion을 사용한 더 충실한 프롬프트 조건 이미지 편집 및 단백질의 개선된 무조건적인 새로운 구조 설계에 대한 효과를 보여줍니다. https://github.com/necludov/super-diffusion

English

The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable It\^o density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, and improved unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion

이토 밀도 추정기를 사용하여 확산 모델의 중첩

The Superposition of Diffusion Models Using the Itô Density Estimator

초록

Summary

Support

Support