NormalCrafter：從影片中學習時間一致的法線擴散先驗

摘要

表面法線估計作為一系列電腦視覺應用的基石。儘管已有大量研究致力於靜態圖像場景，但在基於視頻的法線估計中確保時間一致性仍是一項艱鉅的挑戰。我們提出的NormalCrafter並非簡單地在現有方法上增加時間組件，而是充分利用視頻擴散模型固有的時間先驗知識。為了在序列中實現高保真度的法線估計，我們提出了語義特徵正則化（SFR），該方法將擴散特徵與語義線索對齊，促使模型專注於場景的內在語義。此外，我們引入了一種兩階段訓練協議，該協議結合了潛在空間和像素空間的學習，以在保持長時間上下文的情況下保留空間準確性。廣泛的評估證明了我們方法的有效性，展示了在從多樣化視頻中生成具有精細細節且時間一致的法線序列方面的卓越性能。

English

Surface normal estimation serves as a cornerstone for a spectrum of computer vision applications. While numerous efforts have been devoted to static image scenarios, ensuring temporal coherence in video-based normal estimation remains a formidable challenge. Instead of merely augmenting existing methods with temporal components, we present NormalCrafter to leverage the inherent temporal priors of video diffusion models. To secure high-fidelity normal estimation across sequences, we propose Semantic Feature Regularization (SFR), which aligns diffusion features with semantic cues, encouraging the model to concentrate on the intrinsic semantics of the scene. Moreover, we introduce a two-stage training protocol that leverages both latent and pixel space learning to preserve spatial accuracy while maintaining long temporal context. Extensive evaluations demonstrate the efficacy of our method, showcasing a superior performance in generating temporally consistent normal sequences with intricate details from diverse videos.

NormalCrafter：從影片中學習時間一致的法線擴散先驗

NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

摘要

Summary

Support

Support