3D object detection using point cloud (PC) data is vital for autonomous driving perception pipelines, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for high-accuracy 3D object detection. However, most state-of-the-art methods employing PointPillar overlook the inherent sparsity of pillar encoding, missing opportunities for significant computational reduction. In this study, we propose a groundbreaking algorithm-hardware co-design that accelerates sparse convolution processing and maximizes sparsity utilization in pillar-based 3D object detection networks. We investigate sparsification opportunities using an advanced pillar-pruning method, achieving an optimal balance between accuracy and sparsity. We introduce PillarAcc, a state-of-the-art sparsity support mechanism that enhances sparse pillar convolution through linear complexity input-output mapping generation and conflict-free gather-scatter memory access. Additionally, we propose dataflow optimization techniques, dynamically adjusting the pillar processing schedule for optimal hardware utilization under diverse sparsity operations. We evaluate PillarAcc on various cutting-edge 3D object detection networks and benchmarks, achieving remarkable speedup and energy savings compared to representative edge platforms, demonstrating record-breaking PointPillars speed of 500FPS with minimal compromise in accuracy.
翻译:基于点云(PC)数据的三维目标检测是自动驾驶感知流程的关键环节,其中高效编码是满足严格资源与延迟要求的核心。PointPillars作为一种广泛采用的鸟瞰图(BEV)编码方法,通过将三维点云数据聚合成二维柱体以实现高精度三维目标检测。然而,当前多数采用PointPillar的先进方法均忽视了柱体编码固有的稀疏性,错失了大幅降低计算量的潜在机会。本研究提出一种开创性的算法-硬件协同设计方案,可加速稀疏卷积处理并最大化柱体式三维目标检测网络中的稀疏性利用率。通过采用先进的柱体剪枝方法,我们探索了稀疏化机遇,实现了精度与稀疏度的最优平衡。我们提出的PillarAcc是一种前沿的稀疏性支持机制,通过线性复杂度的输入-输出映射生成与无冲突的聚集-分散存储访问,增强了稀疏柱体卷积。此外,我们提出的数据流优化技术可动态调整柱体处理调度策略,在不同稀疏度操作下实现硬件利用率的最优化。我们在多种前沿三维目标检测网络与基准测试上评估了PillarAcc,相较于代表性边缘平台实现了显著加速与能耗节省,以最小精度代价达成了破纪录的500FPS PointPillars处理速度。