When MPI-parallel simulations run on shared Kubernetes clusters, conventional CPU scheduling leaves the vast majority of provisioned cycles idle at synchronization barriers. This paper presents a multiplexing framework that reclaims this idle capacity by co-locating multiple simulations on the same cluster. PMPI-based duty-cycle profiling quantifies the per-rank idle fraction; proportional CPU allocation then allows a second simulation to execute concurrently with minimal overhead, yielding 1.77x throughput. A Pareto sweep to N=5 concurrent simulations shows throughput scaling to 3.74x, with a knee at N=3 offering the best efficiency-cost trade-off. An analytical model with a single fitted parameter predicts these gains within +/-4%. A dynamic controller automates the full pipeline, from profiling through In-Place Pod Vertical Scaling (KEP-1287) to packing and fairness monitoring, achieving 3.25x throughput for four simulations without manual intervention or pod restarts. To our knowledge, this is the first CPU application of In-Place Pod Vertical Scaling to running MPI processes. Experiments on an AWS cluster with OpenFOAM CFD confirm that the results hold under both concentric and standard graph-based (Scotch) mesh partitioning.
翻译:当MPI并行模拟在共享Kubernetes集群上运行时,传统CPU调度会在同步屏障处导致绝大多数已分配的周期处于空闲状态。本文提出一个多路复用框架,通过在同一集群上共置多个模拟任务来回收这些空闲容量。基于PMPI的占空比分析量化了每个进程的空闲比率;随后按比例分配CPU使得第二个模拟能够以最小开销并发执行,吞吐量达到1.77倍。对N=5个并发模拟进行帕累托扫描显示吞吐量扩展至3.74倍,其中N=3拐点处提供最佳效率-成本权衡。具有单一拟合参数的分析模型预测这些增益的误差在±4%以内。动态控制器实现全流程自动化,从性能分析到原地Pod垂直扩缩(KEP-1287),再到负载打包与公平性监控,无需人工干预或Pod重启即可为四个模拟实现3.25倍吞吐量。据我们所知,这是将原地Pod垂直扩缩首次应用于运行中的MPI进程的CPU场景。在AWS集群上使用OpenFOAM进行的计算流体力学实验证实,该结果在同心网格划分与标准基于图的(Scotch)网格划分下均成立。