This paper introduces a parallel implementation in CUDA/C++ of the Gaussian process with a decomposed kernel. This recent formulation, introduced by Joukov and Kuli\'c (2022), is characterized by an approximated -- but much smaller -- matrix to be inverted compared to plain Gaussian process. However, it exhibits a limitation when dealing with higher-dimensional samples which degrades execution times. The solution presented in this paper relies on parallelizing the computation of the predictive posterior statistics on a GPU using CUDA and its libraries. The CPU code and GPU code are then benchmarked on different CPU-GPU configurations to show the benefits of the parallel implementation on GPU over the CPU.
翻译:本文介绍了在CUDA/C++环境下实现分解核高斯过程的并行计算方法。Joukov与Kulić(2022)提出的这一新方法,通过近似处理生成相较于标准高斯过程规模更小的待求逆矩阵。然而,该方法在处理高维样本时存在局限性,导致执行效率下降。本文提出的解决方案利用CUDA及其函数库,在GPU上实现预测后验统计量的并行计算。通过在不同CPU-GPU配置下对CPU代码与GPU代码进行基准测试,验证了GPU并行实现相较于CPU的优越性。