We report an implementation of the McMurchie-Davidson (MD) algorithm for 3-center and 4-center 2-particle integrals over Gaussian atomic orbitals (AOs) with low and high angular momenta $l$ and varying degrees of contraction for graphical processing units (GPUs). This work builds upon our recent implementation of a matrix form of the MD algorithm that is efficient for GPU evaluation of 4-center 2-particle integrals over Gaussian AOs of high angular momenta ($l\geq 4$) [$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]. The use of unconventional data layouts and three variants of the MD algorithm allow to evaluate integrals in double precision with sustained performance between 25% and 70% of the theoretical hardware peak. Performance assessment includes integrals over AOs with $l\leq 6$ (higher $l$ is supported). Preliminary implementation of the Hartree-Fock exchange operator is presented and assessed for computations with up to quadruple-zeta basis and more than 20,000 AOs. The corresponding C++ code is a part of the experimental open-source $\mathtt{LibintX}$ library available at $\mathbf{github.com:ValeevGroup/LibintX}$.
翻译:我们报告了在图形处理单元(GPU)上实现McMurchie-Davidson(MD)算法,用于计算具有低和高角动量$l$以及不同收缩程度的3中心和4中心2粒子高斯原子轨道(AO)积分。本研究基于我们最近实现的MD算法矩阵形式,该形式对于计算高角动量($l\geq 4$)高斯AO上的4中心2粒子积分在GPU上具有高效性[$\mathit{J.Phys.Chem.A}\ \mathbf{127}$, 10889 (2023)]。通过采用非常规数据布局和MD算法的三种变体,我们能够在双精度下评估积分,持续性能达到理论硬件峰值的25%到70%之间。性能评估包括$l\leq 6$的AO积分(支持更高$l$)。我们还初步实现了Hartree-Fock交换算子,并评估了其用于多达四重基组和超过20,000个AO的计算。相应的C++代码是实验性开源$\mathtt{LibintX}$库的一部分,该库可从$\mathbf{github.com:ValeevGroup/LibintX}$获取。