3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection. However, the state-of-the-art methods employing PointPillars overlook the inherent sparsity of pillar encoding where only a valid pillar is encoded with a vector of channel elements, missing opportunities for significant computational reduction. Meanwhile, current sparse convolution accelerators are designed to handle only element-wise activation sparsity and do not effectively address the vector sparsity imposed by pillar encoding. In this paper, we propose SPADE, an algorithm-hardware co-design strategy to maximize vector sparsity in pillar-based 3D object detection and accelerate vector-sparse convolution commensurate with the improved sparsity. SPADE consists of three components: (1) a dynamic vector pruning algorithm balancing accuracy and computation savings from vector sparsity, (2) a sparse coordinate management hardware transforming 2D systolic array into a vector-sparse convolution accelerator, and (3) sparsity-aware dataflow optimization tailoring sparse convolution schedules for hardware efficiency. Taped-out with a commercial technology, SPADE saves the amount of computation by 36.3--89.2\% for representative 3D object detection networks and benchmarks, leading to 1.3--10.9$\times$ speedup and 1.5--12.6$\times$ energy savings compared to the ideal dense accelerator design. These sparsity-proportional performance gains equate to 4.1--28.8$\times$ speedup and 90.2--372.3$\times$ energy savings compared to the counterpart server and edge platforms.
翻译:摘要:利用点云数据进行三维目标检测是自动驾驶感知流程的关键环节,其中高效编码对于满足严格的资源与延迟要求至关重要。PointPillars作为一种广泛采用的鸟瞰图编码方法,将三维点云数据聚合为二维支柱,以实现快速准确的三维目标检测。然而,采用PointPillars的最新方法忽视了支柱编码固有的稀疏性——仅有效支柱会被编码为包含通道元素的向量,由此错失了大幅降低计算量的机会。同时,现有稀疏卷积加速器仅能处理逐元素激活稀疏性,无法有效应对支柱编码引入的向量稀疏性。本文提出SPADE——一种算法-硬件协同设计策略,旨在最大化基于支柱的三维目标检测中的向量稀疏性,并实现与稀疏度提升相匹配的向量稀疏卷积加速。SPADE包含三个组件:(1)一种动态向量剪枝算法,可在计算节省与精度之间实现平衡;(2)一种稀疏坐标管理硬件,将二维脉动阵列转化为向量稀疏卷积加速器;(3)一种稀疏感知数据流优化方法,可为高效硬件定制稀疏卷积调度策略。基于商业工艺流片测试,SPADE在代表性三维目标检测网络与基准测试中可减少36.3%–89.2%的计算量,相较于理想稠密加速器设计实现1.3–10.9倍加速比与1.5–12.6倍能效提升。这些与稀疏度成比例的性能增益与服务器及边缘平台对比,等效于4.1–28.8倍加速比与90.2–372.3倍能效提升。