Deploying complex, distributed scientific workflows across diverse HPC sites is often hindered by site-specific dependencies and complex build environments. This paper investigates the design and performance of portable HPC container images capable of encapsulating MPI- and CUDA-enabled software stacks without sacrificing bare-metal performance. This work is part of recent work performed within the EBRAINS Research Infrastructure, to evaluate the implementation of portable HPC (Apptainer-based) container images targeting the EBRAINS Software Distribution (ESD) -- a Spack-based software ecosystem comprising approximately 80 top-level packages (and 800 dependencies). We evaluate a hybrid, PMIx-based containerization strategy using Apptainer that seamlessly bypasses the need for site-specific builds by dynamically leveraging host-level specialized hardware, such as network interfaces and GPUs, on two production HPC clusters: Karolina and Jureca-DC. We demonstrate the feasibility of building portable, MPI- and CUDA-enabled scientific software into container images that correctly leverage site-installed drivers and hardware to reproduce bare-metal communication behavior. Using communication microbenchmarks (e.g., OSU and NCCL) alongside performance metrics of applications from neuroscience, we measure and verify their performance against bare-metal deployments. Crucially, our verification approach extends beyond top-level runtime measurements; we highlight the analysis of underlying debug logs to actively detect misbehavior and misconfigurations, such as suboptimal transport pathways. Ultimately, this investigation demonstrates the feasibility of a simple and reproducible methodology for decoupling software environments from underlying infrastructures, paving the way for automated pipelines that ensure optimized, performance-verified execution across varied HPC architectures.
翻译:在多样化的高性能计算站点间部署复杂分布式科学工作流,常受限于站点特定的依赖项和复杂的构建环境。本文研究了可移植高性能计算容器镜像的设计与性能,该镜像能够封装支持MPI和CUDA的软件栈,同时不牺牲裸机性能。本研究是近期在EBRAINS研究基础设施内开展的工作的一部分,旨在评估针对EBRAINS软件分发平台的可移植高性能计算(基于Apptainer)容器镜像的实现——该平台是一个基于Spack的软件生态系统,包含约80个顶层软件包(及800个依赖项)。我们在两个生产级高性能计算集群(Karolina和Jureca-DC)上,评估了一种基于PMIx的混合容器化策略,该策略通过Apptainer动态利用主机层专用硬件(如网络接口和GPU),从而无缝规避了对站点特定构建的需求。我们证明了将支持MPI和CUDA的科学软件构建到容器镜像中的可行性,这些镜像能正确利用站点安装的驱动程序和硬件,复现裸机通信行为。通过使用通信微基准测试(如OSU和NCCL)以及神经科学应用的性能指标,我们测量并验证了其相对于裸机部署的性能。关键的是,我们的验证方法超越了顶层运行时测量;我们强调通过分析底层调试日志来主动检测异常行为和配置错误,例如次优传输路径。最终,本研究证明了一种简单且可复现的方法论在实现软件环境与底层基础设施解耦方面的可行性,为建立自动化流程铺平了道路,从而确保在多样化的高性能计算架构上实现经过性能验证的优化执行。