LoRA.rar:通过超网络学习合并LoRA,用于主题风格条件图像生成

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

December 6, 2024
作者: Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli
cs.AI

摘要

最近图像生成模型的进展使得个性化图像创作成为可能,用户可以定义主题(内容)和风格。以往的研究通过优化方法合并对应的低秩适应参数(LoRAs)来实现个性化,但这种方法在计算上要求很高,不适合在资源受限的设备如智能手机上实时使用。为解决这一问题,我们提出了LoRA.rar 方法,不仅提高了图像质量,还在合并过程中实现了超过4000倍的显著加速。LoRA.rar 在多样的内容-风格 LoRA 对上预训练了一个超网络,学习了一种高效的合并策略,可以泛化到新的、未见过的内容-风格对,实现快速、高质量的个性化。此外,我们发现现有的内容-风格质量评估指标存在局限性,提出了一种使用多模态大语言模型(MLLM)进行更准确评估的新协议。我们的方法在内容和风格的保真度方面明显优于当前的最新技术水平,通过MLLM评估和人类评估得到验证。
English
Recent advancements in image generation models have enabled personalized image creation with both user-defined subjects (content) and styles. Prior works achieved personalization by merging corresponding low-rank adaptation parameters (LoRAs) through optimization-based methods, which are computationally demanding and unsuitable for real-time use on resource-constrained devices like smartphones. To address this, we introduce LoRA.rar, a method that not only improves image quality but also achieves a remarkable speedup of over 4000times in the merging process. LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs, enabling fast, high-quality personalization. Moreover, we identify limitations in existing evaluation metrics for content-style quality and propose a new protocol using multimodal large language models (MLLM) for more accurate assessment. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.

Summary

AI-Generated Summary

PDF113December 11, 2024