This paper introduces a novel computational framework for solving alternating current optimal power flow (ACOPF) problems using graphics processing units (GPUs). While GPUs have demonstrated remarkable performance in various computing domains, their application in AC OPF has been limited due to challenges associated with porting sparse automatic differentiation (AD) and sparse linear solver routines to GPUs. We aim to address these issues with two key strategies. First, we utilize a single-instruction, multiple-data (SIMD) abstraction of nonlinear programs (NLP). This approach enables the specification of model equations while preserving their parallelizable structure, and in turn, facilitates the implementation of AD routines that can exploit such structure. Second, we employ a condensed-space interior-point method (IPM) with an inequality relaxation strategy. This technique involves relaxing equality constraints to inequalities and condensing the Karush-Kuhn-Tucker system into a much smaller positive definite system. This strategy offers the key advantage of being able to factorize the KKT matrix without numerical pivoting, which in the past has hampered the parallelization of the IPM algorithm. By combining these two strategies, we can perform the majority of operations on GPUs while keeping the data residing in the device memory only. Comprehensive numerical benchmark results showcase the substantial computational advantage of our approach. Remarkably, for solving large-scale AC OPF problems to a moderate accuracy, our implementations -- MadNLP.jl and ExaModels.jl -- running on NVIDIA GPUs achieve an order of magnitude speedup compared to state-of-the-art tools running on contemporary CPUs.
翻译:本文提出了一种利用图形处理器求解交流最优潮流问题的新型计算框架。尽管GPU已在多个计算领域展现出卓越性能,但由于将稀疏自动微分和稀疏线性求解器移植至GPU面临挑战,其在AC OPF中的应用仍十分有限。我们通过两项关键策略应对这些难题。首先,采用非线性规划的单指令多数据流抽象。该方法可在保留模型方程可并行化结构的同时实现其规范定义,进而促进能够利用此类结构的自动微分例程开发。其次,采用结合不等式松弛策略的压缩空间内点法。该技术将等式约束松弛为不等式约束,并将卡鲁什-库恩-塔克系统压缩为规模更小的正定系统。这一策略的核心优势在于无需数值主元操作即可对KKT矩阵进行分解,而此类操作此前一直阻碍着内点算法并行化。通过融合这两种策略,我们可在GPU上执行绝大多数运算,同时仅将数据保留在设备内存中。全面的数值基准测试结果表明,本方法具有显著的计算优势。值得注意的是,在将大规模AC OPF问题求解至中等精度时,我们的实现方案——MadNLP.jl与ExaModels.jl——在NVIDIA GPU上相比运行于当代CPU的先进工具实现了数量级的加速。