We study the implementation of the even-odd Wilson fermion matrix for lattice QCD simulations on the A64FX architecture. Efficient coding of the stencil operation is investigated for two-dimensional packing to SIMD vectors. We measure the sustained performance on the supercomputer Fugaku at RIKEN R-CCS and show the profiler result of our code, which may signal an unexpected source of slow-down in addition to the detailed efficiency of each part of the code.
翻译:我们研究了在A64FX架构上实现用于格子QCD模拟的奇偶Wilson费米子矩阵。针对二维打包至SIMD向量,我们探究了模板操作的高效编码。我们在日本理化学研究所R-CCS的超级计算机“富岳”上测量了持续性能,并展示了代码的分析结果。除代码各部分的详细效率外,该结果可能还揭示了一个意外的性能下降来源。