SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection. However, the state-of-the-art methods employing PointPillars overlook the inherent sparsity of pillar encoding where only a valid pillar is encoded with a vector of channel elements, missing opportunities for significant computational reduction. Meanwhile, current sparse convolution accelerators are designed to handle only element-wise activation sparsity and do not effectively address the vector sparsity imposed by pillar encoding. In this paper, we propose SPADE, an algorithm-hardware co-design strategy to maximize vector sparsity in pillar-based 3D object detection and accelerate vector-sparse convolution commensurate with the improved sparsity. SPADE consists of three components: (1) a dynamic vector pruning algorithm balancing accuracy and computation savings from vector sparsity, (2) a sparse coordinate management hardware transforming 2D systolic array into a vector-sparse convolution accelerator, and (3) sparsity-aware dataflow optimization tailoring sparse convolution schedules for hardware efficiency. Taped-out with a commercial technology, SPADE saves the amount of computation by 36.3--89.2\% for representative 3D object detection networks and benchmarks, leading to 1.3--10.9$\times$ speedup and 1.5--12.6$\times$ energy savings compared to the ideal dense accelerator design. These sparsity-proportional performance gains equate to 4.1--28.8$\times$ speedup and 90.2--372.3$\times$ energy savings compared to the counterpart server and edge platforms.

翻译：摘要：利用点云数据进行三维目标检测是自动驾驶感知流程的关键环节，其中高效编码对于满足严格的资源与延迟要求至关重要。PointPillars作为一种广泛采用的鸟瞰图编码方法，将三维点云数据聚合为二维支柱，以实现快速准确的三维目标检测。然而，采用PointPillars的最新方法忽视了支柱编码固有的稀疏性——仅有效支柱会被编码为包含通道元素的向量，由此错失了大幅降低计算量的机会。同时，现有稀疏卷积加速器仅能处理逐元素激活稀疏性，无法有效应对支柱编码引入的向量稀疏性。本文提出SPADE——一种算法-硬件协同设计策略，旨在最大化基于支柱的三维目标检测中的向量稀疏性，并实现与稀疏度提升相匹配的向量稀疏卷积加速。SPADE包含三个组件：（1）一种动态向量剪枝算法，可在计算节省与精度之间实现平衡；（2）一种稀疏坐标管理硬件，将二维脉动阵列转化为向量稀疏卷积加速器；（3）一种稀疏感知数据流优化方法，可为高效硬件定制稀疏卷积调度策略。基于商业工艺流片测试，SPADE在代表性三维目标检测网络与基准测试中可减少36.3%–89.2%的计算量，相较于理想稠密加速器设计实现1.3–10.9倍加速比与1.5–12.6倍能效提升。这些与稀疏度成比例的性能增益与服务器及边缘平台对比，等效于4.1–28.8倍加速比与90.2–372.3倍能效提升。