확산 커리큘럼: 합성에서 실제로의 생성적 커리큘럼 학습 이미지 안내 확산을 통해

초록

저품질 또는 부족한 데이터는 딥 신경망을 훈련하는 데 중대한 어려움을 일으켰습니다. 고전적인 데이터 증강은 매우 다른 새로운 데이터를 기여할 수 없지만, 확산 모델은 텍스트 안내 프롬프트를 통해 고품질이고 다양한 합성 데이터를 생성하여 자가 진화하는 AI를 구축하는 새로운 가능성을 엽니다. 그러나 텍스트만으로는 합성 이미지의 원본 이미지와의 근접성을 제어할 수 없어 모델 성능에 해로운 분포 밖 데이터를 초래할 수 있습니다. 이 제한을 극복하기 위해 우리는 이미지 안내를 연구하여 합성과 실제 이미지 사이의 스펙트럼을 달성합니다. 더 강력한 이미지 안내로 생성된 이미지는 훈련 데이터와 유사하지만 학습이 어렵습니다. 반면에 보다 약한 이미지 안내로는 합성 이미지가 모델에게 더 쉽지만 원본 데이터와의 분포 간격이 커집니다. 생성된 전체 데이터 스펙트럼을 통해 우리는 새로운 "확산 커리큘럼 (DisCL)"을 구축할 수 있습니다. DisCL은 각 훈련 단계에서 이미지 합성의 이미지 안내 수준을 조정합니다: 모델에게 어려운 샘플을 식별하고 집중시키며 어려운 데이터 학습을 개선하기 위해 합성 이미지의 가장 효과적인 안내 수준을 평가합니다. 우리는 DisCL을 두 가지 어려운 작업에 적용합니다: long-tail (LT) 분류 및 저품질 데이터로부터의 학습. 이는 다양성이나 품질이 부족할 수 있는 높은 품질의 낮은 안내 이미지를 학습하여 높은 안내 이미지를 학습하는 웜업으로 대표적인 특징을 학습합니다. 광범위한 실험을 통해 iWildCam 데이터셋에 DisCL을 적용할 때 OOD 및 ID 매크로 정확도가 각각 2.7% 및 2.1% 향상되었습니다. ImageNet-LT에서 DisCL은 기본 모델의 tail-class 정확도를 4.4%에서 23.64%로 향상시키고 전체 클래스 정확도를 4.02% 향상시켰습니다.

English

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

확산 커리큘럼: 합성에서 실제로의 생성적 커리큘럼 학습 이미지 안내 확산을 통해

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

초록

Summary

Support