ImageRAG：面向参考引导图像生成的动态图像检索

摘要

扩散模型能够实现高质量且多样化的视觉内容合成。然而，它们在生成罕见或未见过的概念时表现欠佳。为解决这一挑战，我们探索了将检索增强生成（RAG）与图像生成模型结合使用的方法。我们提出了ImageRAG，该方法能够根据给定的文本提示动态检索相关图像，并将其作为上下文来引导生成过程。以往利用检索图像改进生成效果的方法，通常需要专门训练基于检索的生成模型。与之不同，ImageRAG充分利用了现有图像条件模型的能力，无需进行RAG特定训练。我们的方法具有高度的适应性，可应用于不同类型的模型，显著提升了使用不同基础模型生成罕见和细粒度概念的效果。项目页面地址：https://rotem-shalev.github.io/ImageRAG

English

Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG

ImageRAG：面向参考引导图像生成的动态图像检索

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

摘要

Summary

Support