The SpMV kernel is characterized by high performance variation per input matrix and computing platform. While GPUs were considered State-of-the-Art for SpMV, with the emergence of advanced multicore CPUs and low-power FPGA accelerators, we need to revisit its performance and energy efficiency. This paper provides a high-level SpMV performance analysis based on structural features of matrices related to common bottlenecks of memory-bandwidth intensity, low ILP, load imbalance and memory latency overheads. Towards this, we create a wide artificial matrix dataset that spans these features and study the performance of different storage formats in nine modern HPC platforms; five CPUs, three GPUs and an FPGA. After validating our proposed methodology using real-world matrices, we analyze our extensive experimental results and draw key insights on the competitiveness of different target architectures for SpMV and the impact of each feature/bottleneck on its performance.
翻译:SpMV内核的性能因输入矩阵和计算平台的不同而表现出高度变化。虽然GPU曾被认为是SpMV的最先进技术,但随着先进多核CPU和低功耗FPGA加速器的出现,我们需要重新审视其性能和能效。本文基于矩阵的结构特征,结合内存带宽密集型、低指令级并行性、负载不均衡和内存延迟开销等常见瓶颈,进行了高层次的SpMV性能分析。为此,我们创建了一个覆盖这些特征的广泛人工矩阵数据集,并研究了九种现代HPC平台(五种CPU、三种GPU和一种FPGA)上不同存储格式的性能。在利用真实世界矩阵验证所提出的方法后,我们分析了大量实验结果,并得出了关于不同目标架构在SpMV上的竞争力以及每个特征/瓶颈对其性能影响的关键见解。