ChatPaper.aiChatPaper

增强视频扩散采样的时空跳跃引导

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

November 27, 2024
作者: Junha Hyung, Kinam Kim, Susung Hong, Min-Jung Kim, Jaegul Choo
cs.AI

摘要

扩散模型已成为生成高质量图像、视频和3D内容的强大工具。虽然像CFG这样的采样引导技术可以提高质量,但会降低多样性和运动性。自动引导可以缓解这些问题,但需要额外的弱模型训练,限制了其在大规模模型中的实用性。在这项工作中,我们引入了时空跳跃引导(STG),这是一种简单的无需训练的采样引导方法,用于增强基于Transformer的视频扩散模型。STG利用自我扰动实现了隐式的弱模型,避免了对外部模型或额外训练的需求。通过有选择地跳过时空层,STG生成了原始模型的对齐、降级版本,以提高样本质量,同时不影响多样性或动态程度。我们的贡献包括:(1)将STG作为视频扩散模型的高效、高性能引导技术引入;(2)通过层跳跃模拟弱模型,消除了辅助模型的需求;(3)确保增强质量的引导,同时不影响样本的多样性或动态,与CFG不同。欲了解更多结果,请访问https://junhahyung.github.io/STGuidance。
English
Diffusion models have emerged as a powerful tool for generating high-quality images, videos, and 3D content. While sampling guidance techniques like CFG improve quality, they reduce diversity and motion. Autoguidance mitigates these issues but demands extra weak model training, limiting its practicality for large-scale models. In this work, we introduce Spatiotemporal Skip Guidance (STG), a simple training-free sampling guidance method for enhancing transformer-based video diffusion models. STG employs an implicit weak model via self-perturbation, avoiding the need for external models or additional training. By selectively skipping spatiotemporal layers, STG produces an aligned, degraded version of the original model to boost sample quality without compromising diversity or dynamic degree. Our contributions include: (1) introducing STG as an efficient, high-performing guidance technique for video diffusion models, (2) eliminating the need for auxiliary models by simulating a weak model through layer skipping, and (3) ensuring quality-enhanced guidance without compromising sample diversity or dynamics unlike CFG. For additional results, visit https://junhahyung.github.io/STGuidance.

Summary

AI-Generated Summary

PDF243December 2, 2024