ImageRAG:面向参考引导图像生成的动态图像检索
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
February 13, 2025
作者: Rotem Shalev-Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried
cs.AI
摘要
扩散模型能够实现高质量且多样化的视觉内容合成。然而,它们在生成罕见或未见过的概念时表现欠佳。为解决这一挑战,我们探索了将检索增强生成(RAG)与图像生成模型结合使用的方法。我们提出了ImageRAG,该方法能够根据给定的文本提示动态检索相关图像,并将其作为上下文来引导生成过程。以往利用检索图像改进生成效果的方法,通常需要专门训练基于检索的生成模型。与之不同,ImageRAG充分利用了现有图像条件模型的能力,无需进行RAG特定训练。我们的方法具有高度的适应性,可应用于不同类型的模型,显著提升了使用不同基础模型生成罕见和细粒度概念的效果。
项目页面地址:https://rotem-shalev.github.io/ImageRAG
English
Diffusion models enable high-quality and diverse visual content synthesis.
However, they struggle to generate rare or unseen concepts. To address this
challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with
image generation models. We propose ImageRAG, a method that dynamically
retrieves relevant images based on a given text prompt, and uses them as
context to guide the generation process. Prior approaches that used retrieved
images to improve generation, trained models specifically for retrieval-based
generation. In contrast, ImageRAG leverages the capabilities of existing image
conditioning models, and does not require RAG-specific training. Our approach
is highly adaptable and can be applied across different model types, showing
significant improvement in generating rare and fine-grained concepts using
different base models.
Our project page is available at: https://rotem-shalev.github.io/ImageRAGSummary
AI-Generated Summary