Graph Neural Networks (GNNs) have gained significant momentum recently due to their capability to learn on unstructured graph data. Dynamic GNNs (DGNNs) are the current state-of-the-art for point cloud applications; such applications (viz. autonomous driving) require real-time processing at the edge with tight latency and memory constraints. Conducting performance analysis on such DGNNs, thus, becomes a crucial task to evaluate network suitability. This paper presents a profiling analysis of EdgeConv-based DGNNs applied to point cloud inputs. We assess their inference performance in terms of end-to-end latency and memory consumption on state-of-the-art CPU and GPU platforms. The EdgeConv layer has two stages: (1) dynamic graph generation using k-Nearest Neighbors (kNN) and, (2) node feature updation. The addition of dynamic graph generation via kNN in each (EdgeConv) layer enhances network performance compared to networks that work with the same static graph in each layer; such performance enhancement comes, however, at the added computational cost associated with the dynamic graph generation stage (via kNN algorithm). Understanding its costs is essential for identifying the performance bottleneck and exploring potential avenues for hardware acceleration. To this end, this paper aims to shed light on the performance characteristics of EdgeConv-based DGNNs for point cloud inputs. Our performance analysis on a state-of-the-art EdgeConv network for classification shows that the dynamic graph construction via kNN takes up upwards of 95% of network latency on the GPU and almost 90% on the CPU. Moreover, we propose a quasi-Dynamic Graph Neural Network (qDGNN) that halts dynamic graph updates after a specific depth within the network to significantly reduce the latency on both CPU and GPU whilst matching the original networks inference accuracy.
翻译:图神经网络(GNN)因其处理非结构化图数据的能力,近年来取得了显著发展。动态图神经网络(DGNN)是当前点云应用中最先进的方法;此类应用(如自动驾驶)需要在边缘设备上实现实时处理,并满足严格的延迟和内存约束。因此,对这类动态图神经网络进行性能分析成为评估网络适用性的关键任务。本文对应用于点云输入的基于EdgeConv的动态图神经网络进行了性能剖析分析。我们在最先进的CPU和GPU平台上,从端到端延迟和内存消耗角度评估了其推理性能。EdgeConv层包含两个阶段:(1)利用k近邻(kNN)算法生成动态图,以及(2)节点特征更新。与各层使用相同静态图的网络相比,在每个(EdgeConv)层中通过kNN添加动态图生成步骤可增强网络性能;然而这种性能提升是以动态图生成阶段(通过kNN算法)带来的额外计算开销为代价的。理解其代价对于识别性能瓶颈和探索硬件加速的潜在途径至关重要。为此,本文旨在阐明基于EdgeConv的动态图神经网络在点云输入下的性能特征。我们针对一个用于分类的最先进EdgeConv网络进行的性能分析表明,通过kNN构建动态图的操作在GPU上占网络延迟的95%以上,在CPU上则接近90%。此外,我们提出了一种准动态图神经网络(qDGNN),该网络在达到特定网络深度后停止动态图更新,从而在CPU和GPU上均显著降低延迟,同时保持原始网络的推理精度。