In this paper we describe the development of a streamlined framework for large-scale ATLAS pMSSM reinterpretations of LHC Run-2 analyses using containerised computational workflows. The project is looking to assess the global coverage of BSM physics and requires running O(5k) computational workflows representing pMSSM model points. Following ATLAS Analysis Preservation policies, many analyses have been preserved as containerised Yadage workflows, and after validation were added to a curated selection for the pMSSM study. To run the workflows at scale, we utilised the REANA reusable analysis platform. We describe how the REANA platform was enhanced to ensure the best concurrent throughput by internal service scheduling changes. We discuss the scalability of the approach on Kubernetes clusters from 500 to 5000 cores. Finally, we demonstrate a possibility of using additional ad-hoc public cloud infrastructure resources by running the same workflows on the Google Cloud Platform.
翻译:本文描述了一种利用容器化计算工作流对大型强子对撞机(LHC)Run-2分析进行大规模ATLAS pMSSM重解释的精简框架开发。该项目旨在评估超出标准模型(BSM)物理的全局覆盖范围,需运行约5000个pMSSM模型点对应的计算工作流。遵循ATLAS分析保存政策,多项分析已作为容器化Yadage工作流保存,并在验证后纳入为pMSSM研究精选的分析集合。为实现规模化工作流运行,我们采用了REANA可复用分析平台,并阐述了如何通过内部服务调度优化提升REANA平台的并发吞吐能力。我们讨论了该方法在500至5000核Kubernetes集群上的可扩展性,最后通过Google Cloud Platform运行相同工作流,验证了利用额外临时公共云基础设施资源的可行性。