Existing Point Cloud Networks (PCNs) have proven to achieve great success in many point cloud tasks such as object part segmentation, shape classification, and so on. The most popular point-based PCNs are usually composed of two sequential steps: Data Structuring (DS) and Feature Computation (FC). In this paper, we first describe an important characteristic of the PCN-specific DS step that has not been addressed in existing PCN accelerators: the spatial locality resulting from overlapping points of the gathered point subsets. Using algorithm-hardware co-design, L-PCN (Locality-aware PCN) proposes two novel techniques to exploit this characteristic to reduce the large amount of repetitive operations in the overall PCN. The first of which is a point cloud partitioning technique, Octree-based Islandization. Using Octree-based adjacency gathering, a point cloud is partitioned into islands in L-PCN, where the point subsets inside the same island exhibit a strong spatial correlation. After partitioning, L-PCN performs the rest of PCN steps at the granularity of islands. The second method of L-PCN is scheduling the intra-island computation with a Hub-based Scheduling to exploit the intra-island data reuse by dynamically caching, updating, and reusing the repeated data. The two methods are implemented in an Islandization Unit, which can be seamlessly integrated into standard PCN workflow. Our evaluation shows that based on our methods for exploiting spatial locality, L-PCN achieves a theoretical reduction in feature fetching ranging from 55.2% to 93.8% and in feature computation ranging from 45.4% to 80.6% during the PCN process. For experimentation, prototype L-PCN accelerators are implemented on the Intel Arria 10 GX FPGA. Experimental results prove that with the Islandization Unit as a plug-in, state-of-the-art PCN accelerators can achieve an additional speedup ranging from 1.2x to 3.2x.
翻译:现有点云网络(PCN)已被证明在物体部件分割、形状分类等诸多点云任务中取得了巨大成功。最流行的基于点的点云网络通常由两个顺序步骤组成:数据结构化(DS)和特征计算(FC)。本文首先描述了现有PCN加速器尚未解决的PCN特有DS步骤的一个重要特性:由聚集点子集的重叠点导致的空间局部性。通过算法-硬件协同设计,L-PCN(局部性感知PCN)提出了两种新技术来利用这一特性,以减少整个PCN中的大量重复操作。第一种技术是一种点云划分方法,即基于八叉树的岛化。通过基于八叉树的邻接聚集,点云被划分为L-PCN中的岛,其中同一岛内的点子集表现出强空间相关性。划分后,L-PCN以岛为粒度执行PCN的其余步骤。L-PCN的第二种方法是利用基于中枢的调度对岛内计算进行调度,通过动态缓存、更新和重用重复数据来利用岛内数据重用。这两种方法实现在一个岛化单元中,该单元可以无缝集成到标准的PCN工作流中。我们的评估表明,基于我们利用空间局部性的方法,L-PCN在PCN过程中实现了特征获取的理论减少率从55.2%到93.8%,以及特征计算的理论减少率从45.4%到80.6%。在实验方面,我们在Intel Arria 10 GX FPGA上实现了L-PCN加速器原型。实验结果证明,岛化单元作为插件,可使最先进的PCN加速器获得1.2倍至3.2倍的额外加速。