The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
翻译:大型强子对撞机(LHC)下一阶段在存储和计算资源方面面临的挑战,为各LHC实验提供了强烈的动机,促使它们从多个层面重新审视计算模型。为优化数据分析环节的计算资源利用率,已投入大量工作以降低硬件需求并加速物理分析周期。在此背景下,紧凑型μ子螺线管(CMS)合作组参与了多项旨在评估高能物理(HEP)分析工作流不同解决方案的活动。一个有前景的方向是将软件演进为更具用户友好性的模式,采用声明式编程模型与交互式工作流。计算基础设施需紧跟这一趋势:一方面提供现代化接口,另一方面隐藏底层环境的复杂性,同时高效利用已部署的网格基础设施,并扩展至公共云或高性能计算中心等机会性资源。本文首次展示了利用ROOT RDataFrame技术实现面向生产级CMS物理分析的下一代方法。我们构建了一个新型分析平台,为用户提供基于JupyterLab的现代化交互式Web界面,该界面可调用分布在不同地理位置的基于HTCondor的网格资源。物理分析代码从传统的迭代方法迁移至RDataFrame提供的现代化声明式方法,并分布在多个计算节点上运行。新方案不仅全面提升了编程体验,还相较于传统方法实现了数量级的性能提升。