为了4K分辨率准确度量深度估计,深度提示一切

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

December 18, 2024
作者: Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
cs.AI

摘要

提示在释放语言和视觉基础模型的力量以完成特定任务方面发挥着关键作用。我们首次将提示引入深度基础模型,为度量深度估计创造了一种新范式,称为提示深度任意。具体而言,我们使用低成本的激光雷达作为提示,引导深度任意模型输出准确的度量深度,实现高达4K分辨率。我们的方法围绕简洁的提示融合设计展开,将激光雷达集成到深度解码器中的多个尺度。为解决训练挑战,即包含激光雷达深度和精确GT深度的数据集有限,我们提出了一个可扩展的数据管道,包括合成数据激光雷达模拟和真实数据伪GT深度生成。我们的方法在ARKitScenes和ScanNet++数据集上取得了新的最先进水平,并使下游应用受益,包括3D重建和泛化机器人抓取。
English
Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Summary

AI-Generated Summary

PDF124December 19, 2024