We present a distributed framework of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers demonstrate strong performance on single-node GPU architectures, their applicability to industrial-scale instances is often limited by single-GPU computational throughput. To overcome these challenges, we propose D-PDLP, the first Distributed PDLP framework, which extends PDHG to a multi-GPU setting via a practical two-dimensional grid partitioning of the constraint matrix. To improve load balance and computational efficiency, we introduce a block-wise random permutation strategy combined with nonzero-aware matrix partitioning. By distributing the intensive computation required in PDHG iterations, the proposed framework harnesses multi-GPU parallelism to achieve substantial speedups with relatively low communication overhead. Extensive experiments on standard LP benchmarks (including MIPLIB and Mittelmann instances) as well as huge-scale real-world datasets show that our distributed implementation, built upon cuPDLPx, achieves strong scalability and high performance while preserving full FP64 numerical accuracy.
翻译:我们提出了一种用于求解大规模线性规划问题的原始-对偶混合梯度算法的分布式框架。尽管基于PDHG的求解器在单节点GPU架构上表现出强大的性能,但其在工业级规模问题上的适用性常受限于单GPU的计算吞吐量。为克服这些挑战,我们提出了首个分布式PDLP框架——D-PDLP,该框架通过对约束矩阵进行实用的二维网格划分,将PDHG扩展至多GPU环境。为改善负载均衡与计算效率,我们引入了结合非零元感知矩阵划分的块级随机置换策略。通过分布式处理PDHG迭代所需的高强度计算,所提出的框架利用多GPU并行性,以相对较低的通信开销实现了显著的加速效果。在标准LP基准测试集(包括MIPLIB和Mittelmann实例)以及超大规模真实数据集上的大量实验表明,我们基于cuPDLPx构建的分布式实现,在保持完整FP64数值精度的同时,展现出强大的可扩展性与高性能。