Portable acceleration of CMS computing workflows with coprocessors as a service

from arxiv, Submitted to Computing and Software for Big Science. All figures and tables can be found at http://cms-results.web.cern.ch/cms-results/public-results/publications/MLG-23-001 (CMS Public Pages)

Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.

翻译：大型科学实验（如CERN大型强子对撞机上的CMS实验）的计算需求在未来几十年将急剧增长。为补充中央处理器（CPU）上运行的软件未来性能的提升，探索协处理器在数据处理中的应用具有巨大潜力和重要价值。协处理器是一类补充CPU的计算机处理器，通常能够通过架构设计选择来提升特定函数的执行效率。我们研究了"基于协处理器的优化网络推理服务"（SONIC）方法，并探讨了将这种即服务模式部署于大规模数据处理场景。在研究中，我们选取CMS实验的数据处理工作流，在CPU上运行主工作流，同时将多项机器学习（ML）推理任务卸载至远程或本地协处理器（特别是图形处理器GPU）上执行。通过在谷歌云、普渡大学Tier-2计算中心及两者组合环境中开展的实验，我们证明了这些ML算法在协处理器上的单独加速效果，以及整个工作流吞吐量的相应提升。该方法可轻松推广至不同类型的协处理器，并部署于本地CPU上且不降低吞吐性能。我们强调，SONIC方法实现了协处理器的高效利用，并支持工作流在不同类型协处理器间的便携运行。