動態相機姿態及其定位方法

摘要

在大規模動態網路影片中標註相機姿態，對於推動如真實感影片生成和模擬等領域的發展至關重要。然而，收集這樣的數據集相當困難，因為大多數網路影片並不適合進行姿態估計。此外，即使是對於最先進的方法而言，標註動態網路影片也面臨著重大挑戰。本文中，我們介紹了DynPose-100K，這是一個大規模的動態網路影片數據集，其中標註了相機姿態。我們的收集流程通過精心結合特定任務模型和通用模型來解決篩選問題。在姿態估計方面，我們融合了點追蹤、動態遮罩和從運動恢復結構等最新技術，實現了對現有最先進方法的改進。我們的分析和實驗表明，DynPose-100K在多個關鍵屬性上既具備大規模性又展現出多樣性，為各種下游應用的進步開闢了新途徑。

English

Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.

動態相機姿態及其定位方法

Dynamic Camera Poses and Where to Find Them

摘要

Summary

Support

Support