We present a high-performance evaluation method for 4-center 2-particle integrals over Gaussian atomic orbitals with high angular momenta ($l\geq4$) and arbitrary contraction degrees on graphical processing units (GPUs) and other accelerators. The implementation uses the matrix form of McMurchie-Davidson recurrences. Evaluation of the 4-center integrals over four $l=6$ ($i$) Gaussian AOs in the double precision (FP64) on an NVIDIA V100 GPU outperforms the reference implementation of the Obara-Saika recurrences (${\tt Libint}$) running on a single Intel Xeon core by more than a factor of 1000, easily exceeding the 73:1 ratio of the respective hardware peak FLOP rates while reaching almost 50\% of the V100 peak. The approach can be extended to support AOs with even higher angular momenta; for lower angular momenta ($l\leq3$) additional improvements will be reported elsewhere. The implementation is part of an open-source ${\tt LibintX}$ library feely available at https://github.com/ValeevGroup/LibintX.
翻译:我们提出了一种在图形处理器(GPU)及其他加速器上,针对具有高角动量($l\geq4$)及任意收缩阶数的高斯原子轨道进行4中心2粒子积分的高性能计算方法。该方法基于McMurchie-Davidson递推关系的矩阵形式实现。在NVIDIA V100 GPU上,对四个$l=6$($i$型)高斯原子轨道的4中心积分进行双精度(FP64)计算时,性能比单个Intel Xeon核上运行的Obara-Saika递推参考实现(${\tt Libint}$)高出三个数量级以上,轻松超越两者硬件峰值浮点运算速率73:1的比值,并达到V100峰值性能的近50%。该方案可扩展至支持更高角动量的原子轨道;对于较低角动量($l\leq3$)的额外改进将在别处报道。本实现是开源库${\tt LibintX}$的组成部分,免费获取地址为https://github.com/ValeevGroup/LibintX。