風格友好的 SNR 取樣器用於風格驅動生成

摘要

最近的大規模擴散模型能夠生成高質量圖像，但在學習新的、個性化的藝術風格方面遇到困難，這限制了獨特風格模板的創建。利用參考圖像進行微調是最有前途的方法，但通常盲目地使用用於預訓練的目標和噪聲水平分佈，導致次優風格對齊。我們提出了友好風格的信噪比取樣器，該方法在微調期間積極將信噪比（SNR）分佈轉向更高的噪聲水平，以便專注於風格特徵出現的噪聲水平。這使模型能夠更好地捕捉獨特風格，生成風格對齊度更高的圖像。我們的方法使擴散模型能夠學習並共享新的“風格模板”，增強個性化內容創作。我們展示了生成個人水彩畫、極簡扁平漫畫、3D 渲染、多面板圖像和帶有文本的迷因等風格的能力，從而擴大了風格驅動生成的範圍。

English

Recent large-scale diffusion models generate high-quality images but struggle to learn new, personalized artistic styles, which limits the creation of unique style templates. Fine-tuning with reference images is the most promising approach, but it often blindly utilizes objectives and noise level distributions used for pre-training, leading to suboptimal style alignment. We propose the Style-friendly SNR sampler, which aggressively shifts the signal-to-noise ratio (SNR) distribution toward higher noise levels during fine-tuning to focus on noise levels where stylistic features emerge. This enables models to better capture unique styles and generate images with higher style alignment. Our method allows diffusion models to learn and share new "style templates", enhancing personalized content creation. We demonstrate the ability to generate styles such as personal watercolor paintings, minimal flat cartoons, 3D renderings, multi-panel images, and memes with text, thereby broadening the scope of style-driven generation.

風格友好的 SNR 取樣器用於風格驅動生成

Style-Friendly SNR Sampler for Style-Driven Generation

摘要

Summary

Support