NormalCrafter:基于扩散先验从视频中学习时序一致的法线
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
April 15, 2025
作者: Yanrui Bin, Wenbo Hu, Haoyuan Wang, Xinya Chen, Bing Wang
cs.AI
摘要
表面法线估计是众多计算机视觉应用的基础。尽管已有大量研究致力于静态图像场景,但在视频法线估计中确保时间一致性仍是一个重大挑战。不同于简单地为现有方法添加时间组件,我们提出了NormalCrafter,旨在充分利用视频扩散模型固有的时间先验。为了在序列中实现高保真的法线估计,我们提出了语义特征正则化(SFR),它通过将扩散特征与语义线索对齐,促使模型聚焦于场景的内在语义。此外,我们引入了一种两阶段训练策略,结合潜在空间和像素空间的学习,以在保持长时间上下文的同时确保空间精度。广泛的评估验证了我们方法的有效性,展示了其在从多样视频中生成具有精细细节且时间一致的法线序列方面的卓越性能。
English
Surface normal estimation serves as a cornerstone for a spectrum of computer
vision applications. While numerous efforts have been devoted to static image
scenarios, ensuring temporal coherence in video-based normal estimation remains
a formidable challenge. Instead of merely augmenting existing methods with
temporal components, we present NormalCrafter to leverage the inherent temporal
priors of video diffusion models. To secure high-fidelity normal estimation
across sequences, we propose Semantic Feature Regularization (SFR), which
aligns diffusion features with semantic cues, encouraging the model to
concentrate on the intrinsic semantics of the scene. Moreover, we introduce a
two-stage training protocol that leverages both latent and pixel space learning
to preserve spatial accuracy while maintaining long temporal context. Extensive
evaluations demonstrate the efficacy of our method, showcasing a superior
performance in generating temporally consistent normal sequences with intricate
details from diverse videos.Summary
AI-Generated Summary