LoRA.rar:通過超網絡學習合併主題風格條件圖像生成的LoRA
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
December 6, 2024
作者: Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli
cs.AI
摘要
最近在圖像生成模型方面的進展使得個性化圖像創作成為可能,用戶可以定義主題(內容)和風格。先前的研究通過將相應的低秩適應參數(LoRAs)通過基於優化的方法進行合併,實現了個性化,但這些方法在資源受限的設備(如智能手機)上無法實時使用,因為計算量大。為了解決這個問題,我們提出了LoRA.rar方法,不僅提高了圖像質量,還在合併過程中實現了超過4000倍的顯著加速。LoRA.rar在多樣的內容-風格LoRA對上預先訓練一個超網絡,學習了一種高效的合併策略,可以泛化到新的、未見過的內容-風格對,實現快速、高質量的個性化。此外,我們發現現有的內容-風格質量評估指標存在局限性,並提出了一個新的協議,使用多模態大語言模型(MLLM)進行更準確的評估。我們的方法在內容和風格的忠實度方面明顯優於當前的最新技術水平,經過MLLM評估和人類評估的驗證。
English
Recent advancements in image generation models have enabled personalized
image creation with both user-defined subjects (content) and styles. Prior
works achieved personalization by merging corresponding low-rank adaptation
parameters (LoRAs) through optimization-based methods, which are
computationally demanding and unsuitable for real-time use on
resource-constrained devices like smartphones. To address this, we introduce
LoRA.rar, a method that not only improves image quality but also achieves a
remarkable speedup of over 4000times in the merging process. LoRA.rar
pre-trains a hypernetwork on a diverse set of content-style LoRA pairs,
learning an efficient merging strategy that generalizes to new, unseen
content-style pairs, enabling fast, high-quality personalization. Moreover, we
identify limitations in existing evaluation metrics for content-style quality
and propose a new protocol using multimodal large language models (MLLM) for
more accurate assessment. Our method significantly outperforms the current
state of the art in both content and style fidelity, as validated by MLLM
assessments and human evaluations.Summary
AI-Generated Summary