Today the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processors, and 128 cores on Ampere Altra Max ARM processors. Over the course of the past decade, the CMS data processing framework, CMSSW, has been transformed from a single-threaded framework into a highly concurrent one. The first multithreaded version was brought into production by the start of the LHC Run 2 in 2015. Since then, the framework's threading efficiency has gradually been improved by adding more levels of concurrency and reducing the amount of serial code paths. The latest addition was support for concurrent Runs. In this work we review the concurrency model of the CMSSW, and measure its scalability with real CMS applications, such as simulation and reconstruction, on mode rn many-core machines. We show metrics such as event processing throughput and application memory usage with and without the contribution of I/O, as I/O has been the major scaling limitation for the CMS applications.
翻译:当前LHC离线计算严重依赖CPU资源,尽管从长远未来看,人们对GPU等计算加速器抱有期待。每个CPU插槽的核心数持续稳步增长,最新的AMD EPYC处理器已达到64核(128线程)水平,而Ampere Altra Max ARM处理器则达到128核。过去十年间,CMS数据处理框架CMSSW已从单线程框架转变为高度并发的框架。首个多线程版本于2015年LHC Run 2启动时投入生产。此后,通过增加并发层级和减少串行代码路径,该框架的线程效率逐步提升。最新新增的功能是支持并发运行。在本研究中,我们回顾了CMSSW的并发模型,并在现代多核机器上使用实际CMS应用(如模拟和重建)测量其可扩展性。我们展示了事件处理吞吐量和应用内存使用等指标,包括在有无I/O贡献两种情况下的表现,因为I/O一直是CMS应用的主要扩展瓶颈。