Owing to their remarkable representation capabilities for heterogeneous graph data, Heterogeneous Graph Neural Networks (HGNNs) have been widely adopted in many critical real-world domains such as recommendation systems and medical analysis. Prior to their practical application, identifying the optimal HGNN model parameters tailored to specific tasks through extensive training is a time-consuming and costly process. To enhance the efficiency of HGNN training, it is essential to characterize and analyze the execution semantics and patterns within the training process to identify performance bottlenecks. In this study, we conduct an in-depth quantification and analysis of two mainstream HGNN training scenarios, including single-GPU and multi-GPU distributed training. Based on the characterization results, we disclose the performance bottlenecks and their underlying causes in different HGNN training scenarios and provide optimization guidelines from both software and hardware perspectives.
翻译:鉴于异质图神经网络(HGNNs)在处理异质图数据方面具有卓越的表征能力,其已被广泛应用于推荐系统、医学分析等关键现实领域。在实际应用前,通过大量训练为特定任务确定最优的HGNN模型参数是一个耗时且成本高昂的过程。为提升HGNN训练效率,必须对训练过程中的执行语义与模式进行特性分析与剖析,以识别性能瓶颈。本研究对两种主流的HGNN训练场景(包括单GPU与多GPU分布式训练)进行了深入量化分析。基于特性分析结果,我们揭示了不同HGNN训练场景中的性能瓶颈及其根本成因,并从软件与硬件两个维度提出了优化指导方案。