Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
翻译:预测复杂联邦基础设施中各种设计方案的性能表现并非易事——这类基础设施(如全球大型强子对撞机计算网格WLCG)包含分布广域网的计算站点,需支撑大量用户与工作流。由于此类基础设施的复杂性与规模,仅出于比较评估备选方案的目的,大规模部署实验测试平台并不可行。替代方案是通过仿真研究系统行为。该方法过去已成功应用于识别高能物理领域高效实用的基础设施设计,典型案例是用于研究WLCG初始架构的Monarc仿真框架。为模拟具有复杂网络、数据访问及缓存模式的大规模异构计算系统,亟需开发新型仿真能力。本文概述了基于SimGrid与WRENCH仿真框架、面向分布式计算基础设施上HEP工作负载的现代化仿真工具。以高能物理为案例,展示了其精度与可扩展性的研究成果。通过研究对HEP主流计算架构的假设性调整,揭示了WLCG部分组件的运行机理及可能的优化方向。