NitroFusion: 동적 적대적 훈련을 통한 고품질 단일 단계 확산

초록

우리는 NitroFusion을 소개합니다. 이는 동적 적대적 프레임워크를 통해 고품질 생성을 달성하는 단일 단계 확산에 대한 근본적으로 다른 접근 방식입니다. 한 단계 방법은 속도 이점을 제공하지만 일반적으로 다단계 대안과 비교하여 품질 저하가 발생합니다. 마치 조각가 패널이 구성, 색상 및 기술과 같은 다른 측면에 특화된 종합적인 피드백을 제공하는 것처럼, 우리의 방법은 생성 프로세스를 함께 이끄는 다양한 전문 판별자 헤드의 대규모 전문 판별자 풀을 유지합니다. 각 판별자 그룹은 서로 다른 잡음 수준에서 특정 품질 측면에 대한 전문 지식을 개발하여 고품질 단일 단계 생성을 가능하게 하는 다양한 피드백을 제공합니다. 우리의 프레임워크는 다음을 결합합니다: (i) 생성 품질을 향상시키기 위한 전문 판별자 그룹이 있는 동적 판별자 풀, (ii) 판별자 과적합을 방지하기 위한 전략적 새로고침 메커니즘, (iii) 다중 규모 품질 평가를 위한 전역-지역 판별자 헤드, 그리고 균형 잡음 제거를 위한 무조건적/조건적 훈련. 게다가, 우리의 프레임워크는 바텀-업 세밀화를 통해 유연한 배포를 지원하며, 사용자가 직접 품질-속도 트레이드 오프를 위해 1-4개의 노이즈 제거 단계 사이에서 동일한 모델을 동적으로 선택할 수 있게 합니다. 포괄적인 실험을 통해, 우리는 NitroFusion이 다양한 평가 메트릭을 통해 기존의 단일 단계 방법을 크게 능가하며, 특히 세부 사항과 전체적인 일관성을 보존하는 데 뛰어나다는 것을 입증합니다.

English

We introduce NitroFusion, a fundamentally different approach to single-step diffusion that achieves high-quality generation through a dynamic adversarial framework. While one-step methods offer dramatic speed advantages, they typically suffer from quality degradation compared to their multi-step counterparts. Just as a panel of art critics provides comprehensive feedback by specializing in different aspects like composition, color, and technique, our approach maintains a large pool of specialized discriminator heads that collectively guide the generation process. Each discriminator group develops expertise in specific quality aspects at different noise levels, providing diverse feedback that enables high-fidelity one-step generation. Our framework combines: (i) a dynamic discriminator pool with specialized discriminator groups to improve generation quality, (ii) strategic refresh mechanisms to prevent discriminator overfitting, and (iii) global-local discriminator heads for multi-scale quality assessment, and unconditional/conditional training for balanced generation. Additionally, our framework uniquely supports flexible deployment through bottom-up refinement, allowing users to dynamically choose between 1-4 denoising steps with the same model for direct quality-speed trade-offs. Through comprehensive experiments, we demonstrate that NitroFusion significantly outperforms existing single-step methods across multiple evaluation metrics, particularly excelling in preserving fine details and global consistency.

NitroFusion: 동적 적대적 훈련을 통한 고품질 단일 단계 확산

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

초록

Summary

Support