Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations. To address this issue, we propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy. Ada3D adaptively filters the redundant input, guided by a lightweight importance predictor and the unique properties of the Lidar point cloud. Additionally, we utilize the BEV features' intrinsic sparsity by introducing the Sparsity Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D voxels and decrease the density of 2D BEV feature maps from 100% to 20% without sacrificing accuracy. Ada3D reduces the model computational and memory cost by 5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak memory optimization for the 3D and 2D backbone respectively.
翻译:体素方法已在自动驾驶3D目标检测中取得最先进的性能,但其高昂的计算和内存成本给资源受限的车辆应用带来了挑战。导致高资源消耗的原因之一是激光雷达点云中存在大量冗余背景点,导致3D体素和密集BEV地图表征均存在空间冗余。为解决这一问题,我们提出名为Ada3D的自适应推理框架,专注于利用输入级空间冗余性。Ada3D在轻量级重要性预测器与激光雷达点云独特属性的引导下,自适应地过滤冗余输入。此外,我们通过引入稀疏保持批归一化来利用BEV特征的内在稀疏性。采用Ada3D后,我们在不牺牲精度的情况下将3D体素减少了40%,并将2D BEV特征图的密度从100%降至20%。Ada3D将模型计算和内存成本降低5倍,并分别实现3D骨干网络1.52倍/1.45倍的端到端GPU延迟优化,以及2D骨干网络1.5倍/4.5倍的GPU峰值内存优化。