ChatPaper.aiChatPaper

DreamClear:具有高容量的隐私安全真实世界图像恢复数据集整理

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

October 24, 2024
作者: Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, Hongxia Yang
cs.AI

摘要

在现实世界的情境中,图像恢复(IR)面临重大挑战,因为缺乏高容量模型和全面数据集。为了解决这些问题,我们提出了双重策略:GenIR,一种创新的数据整理流程,以及DreamClear,基于最新 Diffusion Transformer(DiT)的图像恢复模型。GenIR是我们的开创性贡献,是一种双提示学习流程,克服了现有数据集的局限,这些数据集通常只包含几千张图像,因此对于更大的模型具有有限的泛化能力。GenIR将流程简化为三个阶段:图像-文本对构建、基于双提示的微调和数据生成与过滤。这种方法规避了繁琐的数据抓取过程,确保版权合规性,并为IR数据集构建提供了一种经济实惠、隐私安全的解决方案。其结果是一个包含一百万高质量图像的大规模数据集。我们的第二项贡献,DreamClear,是一种基于DiT的图像恢复模型。它利用文本到图像(T2I)扩散模型的生成先验和多模态大语言模型(MLLMs)的强大感知能力,实现了逼真的恢复。为了增强模型对多样现实世界退化的适应性,我们引入了自适应调制器混合(MoAM)。它利用基于标记的退化先验动态集成各种恢复专家,从而扩大模型可以处理的退化范围。我们的详尽实验证实了DreamClear的卓越性能,突显了我们双重策略在现实世界图像恢复中的有效性。代码和预训练模型将在以下网址提供:https://github.com/shallowdream204/DreamClear。
English
Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models will be available at: https://github.com/shallowdream204/DreamClear.

Summary

AI-Generated Summary

PDF193November 16, 2024