We report an implementation of the McMurchie-Davidson (MD) algorithm for 3-center and 4-center 2-particle integrals over Gaussian atomic orbitals (AOs) with low and high angular momenta $l$ and varying degrees of contraction for graphical processing units (GPUs). This work builds upon our recent implementation of a matrix form of the MD algorithm that is efficient for GPU evaluation of 4-center 2-particle integrals over Gaussian AOs of high angular momenta ($l\geq 4$) [$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]. The use of unconventional data layouts and three variants of the MD algorithm allow to evaluate integrals in double precision with sustained performance between 25% and 70% of the theoretical hardware peak. Performance assessment includes integrals over AOs with $l\leq 6$ (higher $l$ is supported). Preliminary implementation of the Hartree-Fock exchange operator is presented and assessed for computations with up to quadruple-zeta basis and more than 20,000 AOs. The corresponding C++ code is a part of the experimental open-source $\mathtt{LibintX}$ library available at $\mathbf{github.com:ValeevGroup/LibintX}$.
翻译:我们报告了针对图形处理器(GPU)上低高角动量$l$及不同收缩度的高斯原子轨道(AO)的三中心和四中心双粒子积分的McMurchie-Davidson(MD)算法实现。本工作基于我们近期提出的矩阵形式MD算法实现,该算法能高效评估GPU上高角动量($l\geq 4$)高斯原子轨道的四中心双粒子积分[$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]。通过采用非常规数据布局和三种MD算法变体,我们实现了双精度积分计算,其持续性能达到理论硬件峰值的25%至70%。性能评估涵盖$l\leq 6$的原子轨道积分(支持更高$l$值)。本文还提出并评估了Hartree-Fock交换算子的初步实现,可支持高达四zeta基组及超过20,000个原子轨道的计算。相关C++代码已作为实验性开源库$\mathtt{LibintX}$的组成部分发布于$\mathbf{github.com:ValeevGroup/LibintX}$。