深度 Pro:不到一秒鐘的銳利單眼度量深度
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
October 2, 2024
作者: Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun
cs.AI
摘要
我們提出了一個用於零標記度量單眼深度估計的基礎模型。我們的模型Depth Pro 可以合成具有無與倫比的銳度和高頻細節的高分辨率深度圖。預測是度量的,具有絕對尺度,並且不依賴於相機內部參數等元數據的可用性。該模型速度快,可以在標準 GPU 上在 0.3 秒內生成一張 2.25 百萬像素的深度圖。這些特點得益於多項技術貢獻,包括用於密集預測的高效多尺度視覺Transformer、結合真實和合成數據集以實現高度度量準確性和細微邊界追踪的訓練協議、用於估計深度圖中邊界準確性的專用評估指標,以及從單張圖像中獲得最先進的焦距估計。大量實驗分析了特定的設計選擇,並展示了Depth Pro 在多個維度上優於先前的工作。我們在 https://github.com/apple/ml-depth-pro 上釋出了代碼和權重。
English
We present a foundation model for zero-shot metric monocular depth
estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with
unparalleled sharpness and high-frequency details. The predictions are metric,
with absolute scale, without relying on the availability of metadata such as
camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map
in 0.3 seconds on a standard GPU. These characteristics are enabled by a number
of technical contributions, including an efficient multi-scale vision
transformer for dense prediction, a training protocol that combines real and
synthetic datasets to achieve high metric accuracy alongside fine boundary
tracing, dedicated evaluation metrics for boundary accuracy in estimated depth
maps, and state-of-the-art focal length estimation from a single image.
Extensive experiments analyze specific design choices and demonstrate that
Depth Pro outperforms prior work along multiple dimensions. We release code and
weights at https://github.com/apple/ml-depth-proSummary
AI-Generated Summary