促使深度任務以達到4K解析度的準確度量深度估計

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

December 18, 2024
作者: Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
cs.AI

摘要

提示在釋放語言和視覺基礎模型的能力方面扮演著至關重要的角色。我們首次將提示引入深度基礎模型,為度量深度估計創造了一種新範式,稱為提示深度任務。具體來說,我們使用成本低廉的LiDAR作為提示,引導深度任務模型以獲得準確的度量深度輸出,實現高達4K的分辨率。我們的方法著重於一種簡潔的提示融合設計,將LiDAR集成到深度解碼器中的多個尺度。為應對訓練挑戰,因限量LiDAR深度和精確GT深度數據集而提出,我們提出了一種可擴展的數據管道,其中包括合成數據LiDAR模擬和真實數據虛擬GT深度生成。我們的方法在ARKitScenes和ScanNet++數據集上設立了新的技術標準,並使下游應用受益,包括3D重建和泛化機器人抓取。
English
Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Summary

AI-Generated Summary

PDF124December 19, 2024