3D object detection using point cloud (PC) data is vital for autonomous driving perception pipelines, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for high-accuracy 3D object detection. However, most state-of-the-art methods employing PointPillar overlook the inherent sparsity of pillar encoding, missing opportunities for significant computational reduction. In this study, we propose a groundbreaking algorithm-hardware co-design that accelerates sparse convolution processing and maximizes sparsity utilization in pillar-based 3D object detection networks. We investigate sparsification opportunities using an advanced pillar-pruning method, achieving an optimal balance between accuracy and sparsity. We introduce PillarAcc, a state-of-the-art sparsity support mechanism that enhances sparse pillar convolution through linear complexity input-output mapping generation and conflict-free gather-scatter memory access. Additionally, we propose dataflow optimization techniques, dynamically adjusting the pillar processing schedule for optimal hardware utilization under diverse sparsity operations. We evaluate PillarAcc on various cutting-edge 3D object detection networks and benchmarks, achieving remarkable speedup and energy savings compared to representative edge platforms, demonstrating record-breaking PointPillars speed of 500FPS with minimal compromise in accuracy.
翻译:基于点云数据的三维目标检测是自动驾驶感知流水线的关键环节,其中高效编码是满足严苛资源与延迟要求的核心。PointPillars作为一种广泛采用的鸟瞰图编码方式,将三维点云数据聚合为二维柱体,以实现高精度的三维目标检测。然而,目前多数采用PointPillar的最先进方法忽略了柱体编码固有的稀疏性,错失了显著降低计算量的机会。本研究提出一种开创性的算法-硬件协同设计方案,可加速稀疏卷积处理并最大化柱体式三维目标检测网络中的稀疏性利用。我们采用先进的柱体剪枝方法探索稀疏化机会,在精度与稀疏性之间实现最优平衡。我们引入PillarsAcc——一种先进的稀疏性支持机制,通过线性复杂度的输入-输出映射生成与无冲突的聚集-分散内存访问,增强稀疏柱体卷积。此外,我们提出数据流优化技术,在不同稀疏度操作下动态调整柱体处理调度,以实现最优硬件利用率。我们在多种前沿三维目标检测网络与基准测试上评估Pillacc,与代表性边缘平台相比,实现显著加速与能耗节省,以最小精度损失展现了破纪录的500FPS PointPillars处理速度。