Point-cloud-based 3D perception has attracted great attention in various applications including robotics, autonomous driving and AR/VR. In particular, the 3D sparse convolution (SpConv) network has emerged as one of the most popular backbones due to its excellent performance. However, it poses severe challenges to real-time perception on general-purpose platforms, such as lengthy map search latency, high computation cost, and enormous memory footprint. In this paper, we propose SpOctA, a SpConv accelerator that enables high-speed and energy-efficient point cloud processing. SpOctA parallelizes the map search by utilizing algorithm-architecture co-optimization based on octree encoding, thereby achieving 8.8-21.2x search speedup. It also attenuates the heavy computational workload by exploiting inherent sparsity of each voxel, which eliminates computation redundancy and saves 44.4-79.1% processing latency. To optimize on-chip memory management, a SpConv-oriented non-uniform caching strategy is introduced to reduce external memory access energy by 57.6% on average. Implemented on a 40nm technology and extensively evaluated on representative benchmarks, SpOctA rivals the state-of-the-art SpConv accelerators by 1.1-6.9x speedup with 1.5-3.1x energy efficiency improvement.
翻译:基于点云的3D感知在机器人、自动驾驶和AR/VR等各类应用中备受关注。其中,3D稀疏卷积网络因其卓越性能已成为最主流的骨干网络之一。然而,该网络对通用平台上的实时感知带来严峻挑战,包括长映射搜索延迟、高计算成本和巨大内存占用。本文提出SpOctA,一种可实现高速、高能效点云处理的SpConv加速器。SpOctA通过基于八叉树编码的算法-体系结构协同优化实现映射搜索并行化,从而获得8.8至21.2倍的搜索加速。该加速器通过利用每个体素的固有稀疏性来减轻繁重的计算负载,消除计算冗余并节省44.4%至79.1%的处理延迟。为优化片上内存管理,引入面向SpConv的非均匀缓存策略,平均减少57.6%的外部内存访问能耗。基于40nm工艺实现并在代表性基准测试上广泛评估,SpOctA相比最先进的SpConv加速器实现1.1至6.9倍加速比和1.5至3.1倍能效提升。