ChatPaper.aiChatPaper

DreamClear:具有高容量的真實世界影像修復與隱私安全的數據集整理

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

October 24, 2024
作者: Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, Hongxia Yang
cs.AI

摘要

在現實世界的情境中,圖像修復(IR)面臨著重大挑戰,原因是缺乏高容量模型和全面的數據集。為了應對這些問題,我們提出了一種雙重策略:GenIR,一個創新的數據整理流程,以及DreamClear,一個基於最先進的擴散Transformer(DiT)的圖像修復模型。GenIR是我們的開創性貢獻,是一個雙提示學習流程,克服了現有數據集的局限性,這些數據集通常只包含幾千張圖像,因此對於更大的模型具有有限的泛化能力。GenIR將流程分為三個階段:圖像-文本對構建、基於雙提示的微調和數據生成與篩選。這種方法避免了繁瑣的數據爬取過程,確保版權合規性,並為IR數據集構建提供了一種具有成本效益、隱私安全的解決方案。其結果是一個包含一百萬張高質量圖像的大規模數據集。我們的第二個貢獻DreamClear是一個基於DiT的圖像修復模型。它利用文本到圖像(T2I)擴散模型的生成先驗和多模式大型語言模型(MLLMs)的強大感知能力來實現逼真的修復。為了增強模型對多樣現實世界劣化的適應能力,我們引入了自適應調節器混合(MoAM)。它利用基於標記的劣化先驗來動態整合各種修復專家,從而擴展模型可以應對的劣化範圍。我們的大量實驗證實了DreamClear卓越的性能,突顯了我們雙重策略在現實世界圖像修復中的有效性。代碼和預訓練模型將在以下鏈接提供:https://github.com/shallowdream204/DreamClear。
English
Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models will be available at: https://github.com/shallowdream204/DreamClear.

Summary

AI-Generated Summary

PDF193November 16, 2024