擴散課程：通過圖像引導擴散實現從合成到真實的生成式課程學習

摘要

在實踐中，低質量或稀缺的數據對深度神經網絡的訓練構成了重大挑戰。傳統的數據擴增無法提供截然不同的新數據，擴散模型為通過文本引導提示生成高質量和多樣性合成數據打開了一扇新的大門，從而構建自我演進的人工智能。然而，僅依靠文本引導無法控制合成圖像與原始圖像的接近程度，導致超出分布的數據對模型性能有害。為了克服這一限制，我們研究了圖像引導，以實現合成和真實圖像之間的一系列插值。通過更強的圖像引導，生成的圖像與訓練數據相似但難以學習。而通過較弱的圖像引導，合成圖像對模型更容易，但會導致與原始數據之間的分布差距增大。生成的完整數據範譜使我們能夠構建一個新的“擴散課程（DisCL）”。DisCL調整了每個訓練階段的圖像合成引導級別：它識別並專注於模型的困難樣本，評估合成圖像的最有效引導級別以改善困難數據的學習。我們將DisCL應用於兩個具有挑戰性的任務：長尾（LT）分類和從低質量數據中學習。它專注於高質量的低引導圖像，以學習原型特徵作為學習更高引導圖像的熱身，這些圖像可能在多樣性或質量上較弱。大量實驗展示了將DisCL應用於iWildCam數據集時，OOD和ID宏準確度分別提高了2.7％和2.1％。在ImageNet-LT上，DisCL將基礎模型的尾部類別準確度從4.4％提高到23.64％，並使全類別準確度提高了4.02％。

English

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

擴散課程：通過圖像引導擴散實現從合成到真實的生成式課程學習

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

摘要

Summary

Support

Support