Gaussian processes are a powerful tool for modeling continuous fields, but their naive $\mathcal{O}(N^3)$ computational cost and $\mathcal{O}(N^2)$ memory requirement often limit their practical use. Vecchia's approximation is a sparse precision matrix approximation for stationary, decaying kernels that conditions each point only on its $k$ nearest neighbors. We present GraphGP, a GPU algorithm for Vecchia's approximation that scales to nearly a billion parameters with linear time and memory requirements, handling arbitrary point distributions over a large dynamic range. Our key contributions are (1) a bit-reversed k-d tree ordering that allows efficient neighbor searches while also maximizing batch parallelism, and (2) a differentiable CUDA implementation, which is substantially faster and more memory efficient than our pure JAX baseline. GraphGP provides the building blocks for inference, including forward generation, inverse application, log-determinant, and kernel parameter derivatives.
翻译:高斯过程是连续场建模的强大工具,但其朴素实现的$\mathcal{O}(N^3)$计算复杂度与$\mathcal{O}(N^2)$内存需求通常限制了其实际应用。Vecchia近似是一种针对平稳衰减核函数的稀疏精度矩阵近似方法,它将每个数据点仅与其$k$个最近邻建立条件依赖关系。我们提出GraphGP——一种面向Vecchia近似的GPU算法,该算法在保持线性时间与内存需求的同时,可扩展至近十亿个参数,并能处理大动态范围内的任意点分布。我们的核心贡献包括:(1) 一种位反转k-d树排序法,可在最大化批处理并行性的同时实现高效近邻搜索;(2) 一种可微分的CUDA实现,其速度与内存效率显著优于纯JAX基线方案。GraphGP提供了推理所需的构建模块,包括前向生成、逆应用、对数行列式计算以及核参数导数。