The numerical precision of density-functional-theory (DFT) calculations depends on a variety of computational parameters, one of the most critical being the basis-set size. The ultimate precision is reached with an infinitely large basis set, i.e., in the limit of a complete basis set (CBS). Our aim in this work is to find a machine-learning model that extrapolates finite basis-size calculations to the CBS limit. We start with a data set of 63 binary solids investigated with two all-electron DFT codes, exciting and FHI-aims, which employ very different types of basis sets. A quantile-random-forest model is used to estimate the total-energy correction with respect to a fully converged calculation as a function of the basis-set size. The random-forest model achieves a symmetric mean absolute percentage error of lower than 25% for both codes and outperforms previous approaches in the literature. Our approach also provides prediction intervals, which quantify the uncertainty of the models' predictions.
翻译:密度泛函理论(DFT)计算的数值精度受多种计算参数影响,其中最关键参数之一是基组尺寸。当使用无限大基组时,即达到完备基组(CBS)极限,可获得最终精度。本研究旨在寻找一种机器学习模型,实现有限基组尺寸计算向CBS极限的外推。我们从63种二元固体的数据集出发,采用两种全电子DFT代码(exciting和FHI-aims)进行计算,这两种代码使用了不同类型的基组。我们利用分位数随机森林模型,估算相对于完全收敛计算的总能量修正量作为基组尺寸的函数。该随机森林模型对两种代码的对称平均绝对百分比误差均低于25%,优于文献中已有的方法。此外,我们的方法还能提供预测区间,用于量化模型预测的不确定性。