ImageRAG: 참조 기반 이미지 생성을 위한 동적 이미지 검색

초록

디퓨전 모델은 고품질이고 다양성 있는 시각적 콘텐츠 합성을 가능하게 합니다. 그러나 이들은 희귀하거나 보지 못한 개념을 생성하는 데 어려움을 겪습니다. 이 문제를 해결하기 위해, 우리는 이미지 생성 모델과 함께 검색 증강 생성(Retrieval-Augmented Generation, RAG)의 활용을 탐구합니다. 우리는 ImageRAG라는 방법을 제안하는데, 이는 주어진 텍스트 프롬프트를 기반으로 관련 이미지를 동적으로 검색하고, 이를 생성 과정을 안내하는 컨텍스트로 사용합니다. 검색된 이미지를 사용하여 생성을 개선하려는 기존 접근법들은 검색 기반 생성을 위해 특별히 모델을 훈련시켰습니다. 반면, ImageRAG는 기존의 이미지 조건화 모델의 능력을 활용하며, RAG 전용 훈련이 필요하지 않습니다. 우리의 접근 방식은 매우 적응적이며 다양한 모델 유형에 적용할 수 있어, 다양한 기본 모델을 사용하여 희귀하고 세밀한 개념을 생성하는 데 있어 상당한 개선을 보여줍니다. 프로젝트 페이지는 다음에서 확인할 수 있습니다: https://rotem-shalev.github.io/ImageRAG

English

Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG

ImageRAG: 참조 기반 이미지 생성을 위한 동적 이미지 검색

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

초록

Support