ChatPaper.aiChatPaper

DiffuEraser:一种视频修复的扩散模型

DiffuEraser: A Diffusion Model for Video Inpainting

January 17, 2025
作者: Xiaowen Li, Haolan Xue, Peiran Ren, Liefeng Bo
cs.AI

摘要

最近的视频修复算法将基于流的像素传播与基于变压器的生成相结合,利用光流从相邻帧恢复纹理和物体信息,同时通过视觉变压器完成遮挡区域。然而,这些方法在处理大面积遮罩时往往会出现模糊和时间不一致的问题,突显了需要具有增强生成能力的模型。最近,扩散模型作为图像和视频生成中一种卓越的技术应运而生,因其出色的性能。本文介绍了DiffuEraser,一种基于稳定扩散的视频修复模型,旨在用更多细节和更连贯的结构填补遮罩区域。我们结合先验信息提供初始化和弱条件,有助于减轻噪点和抑制虚假信息。此外,为了在长序列推断期间改善时间一致性,我们扩展了先验模型和DiffuEraser的时间感知域,并通过利用视频扩散模型的时间平滑属性进一步增强一致性。实验结果表明,我们提出的方法在内容完整性和时间一致性方面优于最先进的技术,同时保持可接受的效率。
English
Recent video inpainting algorithms integrate flow-based pixel propagation with transformer-based generation to leverage optical flow for restoring textures and objects using information from neighboring frames, while completing masked regions through visual Transformers. However, these approaches often encounter blurring and temporal inconsistencies when dealing with large masks, highlighting the need for models with enhanced generative capabilities. Recently, diffusion models have emerged as a prominent technique in image and video generation due to their impressive performance. In this paper, we introduce DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. We incorporate prior information to provide initialization and weak conditioning,which helps mitigate noisy artifacts and suppress hallucinations. Additionally, to improve temporal consistency during long-sequence inference, we expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models. Experimental results demonstrate that our proposed method outperforms state-of-the-art techniques in both content completeness and temporal consistency while maintaining acceptable efficiency.

Summary

AI-Generated Summary

PDF142January 24, 2025