Memory performance is a bottleneck in graph analytics acceleration. Existing Machine Learning (ML) prefetchers struggle with phase transitions and irregular memory accesses in graph processing. We propose MPGraph, an ML-based Prefetcher for Graph analytics using domain specific models. MPGraph introduces three novel optimizations: soft detection for phase transitions, phase-specific multi-modality models for access delta and page predictions, and chain spatio-temporal prefetching (CSTP) for prefetch control. Our transition detector achieves 34.17-82.15% higher precision compared with Kolmogorov-Smirnov Windowing and decision tree. Our predictors achieve 6.80-16.02% higher F1-score for delta and 11.68-15.41% higher accuracy-at-10 for page prediction compared with LSTM and vanilla attention models. Using CSTP, MPGraph achieves 12.52-21.23% IPC improvement, outperforming state-of-the-art non-ML prefetcher BO by 7.58-12.03% and ML-based prefetchers Voyager and TransFetch by 3.27-4.58%. For practical implementation, we demonstrate MPGraph using compressed models with reduced latency shows significantly superior accuracy and coverage compared with BO, leading to 3.58% higher IPC improvement.
翻译:内存性能是图分析加速中的瓶颈。现有的机器学习预取器在处理图处理过程中的阶段转换和非规则内存访问时存在困难。我们提出MPGraph——一种基于领域特定模型的图分析机器学习预取器。MPGraph引入三种新颖优化:阶段转换的软检测技术、用于访问增量和页预测的阶段特定多模态模型,以及用于预取控制的链式时空预取(CSTP)。与Kolmogorov-Smirnov窗口法和决策树相比,我们的阶段检测器精度提升34.17%-82.15%;与LSTM和普通注意力模型相比,增量预测的F1分数提升6.80%-16.02%,页预测的Top-10准确率提升11.68%-15.41%。通过CSTP,MPGraph实现12.52%-21.23%的IPC提升,优于最先进的非机器学习预取器BO(7.58%-12.03%)及基于机器学习的预取器Voyager和TransFetch(3.27%-4.58%)。在实际部署中,我们证明采用压缩模型并降低延迟的MPGraph在精度和覆盖范围上显著优于BO,额外带来3.58%的IPC提升。