The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported.
翻译:LHC实验的计算资源需求预计在Run 3及HL-LHC时代将持续显著增长。可用资源的格局也将演变,因为高性能计算(HPC)和云资源将提供相当甚至占主导地位的总计算能力份额。未来几年对实验的资源供应模式提出了挑战,无论是在可扩展性还是日益增长的复杂性方面。CMS提交基础设施(SI)为CMS工作流提供计算资源。该基础设施建立在一组联合的HTCondor池之上,目前聚合了全球分布的40万个CPU核心,并支持同时执行超过20万个计算任务。将HPC资源纳入CMS计算首先是一个集成挑战,因为与网格站点相比,HPC中心更加多样化。其次,将当前为利用现有CMS计算能力而设计的SI进行演进,以达到HL-LHC阶段所需的资源规模,同时保持全局灵活性和效率,将是SI面临的另一项挑战。为预防性地应对未来潜在的可扩展性限制,SI团队定期运行测试以探索我们基础设施的最大极限。本文总结了HPC资源与CMS离线计算的集成,描述了因运营规模扩大而对SI产生的潜在问题,并报告了CMS SI可扩展性测试的最新结果。