The world's largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take advantage of the available resources. SYCL allows for a single-source implementation, which enables support for different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on a core high energy physics operation in RDataFrame -- histogramming. We detail the challenges that we faced when integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide actionable insights for developers of SYCL applications.
翻译:位于CERN的世界最大粒子加速器每秒产生PB级数据,需高效分析以探索宇宙基本结构。ROOT是为满足该需求开发的一款开源C++数据分析框架。其高层数据分析接口RDataFrame当前仅支持CPU并行计算。鉴于计算设施日益异构化,高效支持GPGPU以充分利用现有资源变得至关重要。SYCL支持单源实现,可兼容不同架构。本文描述了RDataFrame中一项核心高能物理操作——直方图统计的CUDA实现及其向SYCL的迁移过程。我们详细阐述了将SYCL集成至庞大复杂代码库时面临的挑战,并对两种SYCL编译器(AdaptiveCpp与DPC++)及参考CUDA实现进行了全面的性能对比分析。通过定位关键性能瓶颈并介绍检测方法,我们最终为SYCL应用开发者提供了可操作的优化建议。