ChatPaper.aiChatPaper

ImageRAG:面向参考引导图像生成的动态图像检索

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

February 13, 2025
作者: Rotem Shalev-Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried
cs.AI

摘要

扩散模型能够实现高质量且多样化的视觉内容合成。然而,它们在生成罕见或未见过的概念时表现欠佳。为解决这一挑战,我们探索了将检索增强生成(RAG)与图像生成模型结合使用的方法。我们提出了ImageRAG,该方法能够根据给定的文本提示动态检索相关图像,并将其作为上下文来引导生成过程。以往利用检索图像改进生成效果的方法,通常需要专门训练基于检索的生成模型。与之不同,ImageRAG充分利用了现有图像条件模型的能力,无需进行RAG特定训练。我们的方法具有高度的适应性,可应用于不同类型的模型,显著提升了使用不同基础模型生成罕见和细粒度概念的效果。 项目页面地址:https://rotem-shalev.github.io/ImageRAG
English
Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG

Summary

AI-Generated Summary

PDF182February 17, 2025