NormalCrafter:從影片中學習時間一致的法線 擴散先驗
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
April 15, 2025
作者: Yanrui Bin, Wenbo Hu, Haoyuan Wang, Xinya Chen, Bing Wang
cs.AI
摘要
表面法線估計作為一系列電腦視覺應用的基石。儘管已有大量研究致力於靜態圖像場景,但在基於視頻的法線估計中確保時間一致性仍是一項艱鉅的挑戰。我們提出的NormalCrafter並非簡單地在現有方法上增加時間組件,而是充分利用視頻擴散模型固有的時間先驗知識。為了在序列中實現高保真度的法線估計,我們提出了語義特徵正則化(SFR),該方法將擴散特徵與語義線索對齊,促使模型專注於場景的內在語義。此外,我們引入了一種兩階段訓練協議,該協議結合了潛在空間和像素空間的學習,以在保持長時間上下文的情況下保留空間準確性。廣泛的評估證明了我們方法的有效性,展示了在從多樣化視頻中生成具有精細細節且時間一致的法線序列方面的卓越性能。
English
Surface normal estimation serves as a cornerstone for a spectrum of computer
vision applications. While numerous efforts have been devoted to static image
scenarios, ensuring temporal coherence in video-based normal estimation remains
a formidable challenge. Instead of merely augmenting existing methods with
temporal components, we present NormalCrafter to leverage the inherent temporal
priors of video diffusion models. To secure high-fidelity normal estimation
across sequences, we propose Semantic Feature Regularization (SFR), which
aligns diffusion features with semantic cues, encouraging the model to
concentrate on the intrinsic semantics of the scene. Moreover, we introduce a
two-stage training protocol that leverages both latent and pixel space learning
to preserve spatial accuracy while maintaining long temporal context. Extensive
evaluations demonstrate the efficacy of our method, showcasing a superior
performance in generating temporally consistent normal sequences with intricate
details from diverse videos.Summary
AI-Generated Summary