This paper introduces a framework for solving alternating current optimal power flow (ACOPF) problems using graphics processing units (GPUs). While GPUs have demonstrated remarkable performance in various computing domains, their application in ACOPF has been limited due to challenges associated with porting sparse automatic differentiation (AD) and sparse linear solver routines to GPUs. We address these issues with two key strategies. First, we utilize a single-instruction, multiple-data abstraction of nonlinear programs. This approach enables the specification of model equations while preserving their parallelizable structure and, in turn, facilitates the parallel AD implementation. Second, we employ a condensed-space interior-point method (IPM) with an inequality relaxation. This technique involves condensing the Karush--Kuhn--Tucker (KKT) system into a positive definite system. This strategy offers the key advantage of being able to factorize the KKT matrix without numerical pivoting, which has hampered the parallelization of the IPM algorithm. By combining these strategies, we can perform the majority of operations on GPUs while keeping the data residing in the device memory only. Comprehensive numerical benchmark results showcase the advantage of our approach. Remarkably, our implementations -- MadNLP.jl and ExaModels.jl -- running on NVIDIA GPUs achieve an order of magnitude speedup compared with state-of-the-art tools running on contemporary CPUs.
翻译:本文介绍了一种利用图形处理器(GPU)求解交流最优潮流(ACOPF)问题的框架。尽管GPU在多个计算领域展现出卓越性能,但由于将稀疏自动微分(AD)和稀疏线性求解器移植至GPU存在挑战,其应用于ACOPF受到限制。我们通过两项关键策略解决这些问题。首先,采用非线性规划的单指令多数据抽象方法。该方法在保留模型方程可并行结构的同时实现其规范定义,进而促进高效并行AD实现。其次,采用带不等式松弛的压缩空间内点法(IPM)。该技术将Karush-Kuhn-Tucker(KKT)系统压缩为正定系统。此策略的核心优势在于无需数值选主元即可对KKT矩阵进行分解,而数值选主元问题一直阻碍着IPM算法的并行化。通过结合这些策略,我们可在GPU上执行绝大部分运算,同时仅将数据驻留于设备内存中。全面的数值基准测试结果展示了我们方法的优势。值得注意的是,在NVIDIA GPU上运行的MadNLP.jl和ExaModels.jl实现方案,相比当代CPU上的最新工具实现了数量级的加速。