将扩散模型提炼为高效的3D LiDAR场景补全
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
December 4, 2024
作者: Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun GU, Lingyun Sun
cs.AI
摘要
由于扩散模型具有强大的训练稳定性和高完成质量,因此已将其应用于3D LiDAR场景补全。然而,由于自动驾驶车辆需要对周围环境进行高效感知,扩散模型的缓慢采样速度限制了基于扩散的场景补全模型的实际应用。本文提出了一种针对3D LiDAR场景补全模型量身定制的新型蒸馏方法,名为ScoreLiDAR,实现了高效且高质量的场景补全。ScoreLiDAR使经过蒸馏的模型在蒸馏后能够在更少的步骤中进行采样。为了提高补全质量,我们还引入了一种新颖的结构损失,鼓励经过蒸馏的模型捕捉3D LiDAR场景的几何结构。该损失包含一个约束整体结构的场景项和一个约束关键地标点及其相对配置的点项。大量实验证明,ScoreLiDAR将SemanticKITTI上每帧的完成时间从30.55秒加速到5.37秒(>5倍),并且相较于最先进的3D LiDAR场景补全模型,取得了更优越的性能。我们的代码可在https://github.com/happyw1nd/ScoreLiDAR 上公开获取。
English
Diffusion models have been applied to 3D LiDAR scene completion due to their
strong training stability and high completion quality. However, the slow
sampling speed limits the practical application of diffusion-based scene
completion models since autonomous vehicles require an efficient perception of
surrounding environments. This paper proposes a novel distillation method
tailored for 3D LiDAR scene completion models, dubbed ScoreLiDAR,
which achieves efficient yet high-quality scene completion. ScoreLiDAR enables
the distilled model to sample in significantly fewer steps after distillation.
To improve completion quality, we also introduce a novel Structural
Loss, which encourages the distilled model to capture the geometric structure
of the 3D LiDAR scene. The loss contains a scene-wise term constraining the
holistic structure and a point-wise term constraining the key landmark points
and their relative configuration. Extensive experiments demonstrate that
ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37
seconds per frame (>5times) on SemanticKITTI and achieves superior
performance compared to state-of-the-art 3D LiDAR scene completion models. Our
code is publicly available at https://github.com/happyw1nd/ScoreLiDAR.Summary
AI-Generated Summary