Addressing the statistical challenge of computing the multivariate normal (MVN) probability in high dimensions holds significant potential for enhancing various applications. One common way to compute high-dimensional MVN probabilities is the Separation-of-Variables (SOV) algorithm. This algorithm is known for its high computational complexity of O(n^3) and space complexity of O(n^2), mainly due to a Cholesky factorization operation for an n X n covariance matrix, where $n$ represents the dimensionality of the MVN problem. This work proposes a high-performance computing framework that allows scaling the SOV algorithm and, subsequently, the confidence region detection algorithm. The framework leverages parallel linear algebra algorithms with a task-based programming model to achieve performance scalability in computing process probabilities, especially on large-scale systems. In addition, we enhance our implementation by incorporating Tile Low-Rank (TLR) approximation techniques to reduce algorithmic complexity without compromising the necessary accuracy. To evaluate the performance and accuracy of our framework, we conduct assessments using simulated data and a wind speed dataset. Our proposed implementation effectively handles high-dimensional multivariate normal (MVN) probability computations on shared and distributed-memory systems using finite precision arithmetics and TLR approximation computation. Performance results show a significant speedup of up to 20X in solving the MVN problem using TLR approximation compared to the reference dense solution without sacrificing the application's accuracy. The qualitative results on synthetic and real datasets demonstrate how we maintain high accuracy in detecting confidence regions even when relying on TLR approximation to perform the underlying linear algebra operations.
翻译:解决高维多元正态(MVN)概率计算的统计学难题对于提升多种应用具有显著潜力。计算高维MVN概率的常用方法之一是变量分离(SOV)算法。该算法因需对n×n协方差矩阵进行Cholesky分解而具有O(n^3)的高计算复杂度和O(n^2)的空间复杂度,其中$n$表示MVN问题的维度。本研究提出了一种高性能计算框架,能够扩展SOV算法及后续的置信区域检测算法。该框架结合基于任务的编程模型与并行线性代数算法,在计算过程概率时实现性能可扩展性,尤其适用于大规模系统。此外,我们通过引入分块低秩(TLR)近似技术来降低算法复杂度,同时保持必要的精度。为评估框架的性能与精度,我们使用模拟数据和风速数据集进行测试。所提出的实现方案通过有限精度算术和TLR近似计算,在共享内存与分布式内存系统上有效处理高维多元正态概率计算。性能结果表明:与参考稠密解相比,采用TLR近似求解MVN问题可获得高达20倍的加速比,且未牺牲应用精度。在合成与真实数据集上的定性结果表明,即使依赖TLR近似执行底层线性代数运算,我们仍能保持置信区域检测的高精度。