Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
翻译:在复杂联邦式基础设施中,预测各种基础设施设计选项的性能并非易事。此类基础设施包含分布在全球广域网上的计算站点,需支持大量用户与工作流(例如全球大型强子对撞机计算网格WLCG)。由于此类基础设施的复杂性与规模,仅出于比较和评估替代设计方案的目的而大规模部署实验测试平台并不现实。另一种选择是通过仿真研究这些系统的行为。该方法过去已成功用于识别高效且实用的高能物理(HEP)基础设施设计。一个典型范例是Monarc仿真框架,曾用于研究WLCG的初始架构。当前需要新的仿真能力来模拟包含复杂网络、数据访问及缓存模式的大规模异构计算系统。本文概述了一种基于SimGrid与WRENCH仿真框架的现代化工具,用于模拟在分布式计算基础设施上执行的HEP工作负载。并以HEP作为案例研究,展示了其准确性与可扩展性分析。通过对HEP主流计算架构的假设性调整,本研究揭示了WLCG部分组件的动态特性及潜在的改进方向。