The Latin American Giant Observatory (LAGO) project utilizes extensive High-Performance Computing (HPC) resources for complex astroparticle physics simulations, making resource efficiency critical for scientific productivity and sustainability. This article presents a detailed analysis focused on quantifying and improving HPC resource utilization efficiency specifically within the LAGO computational environment. The core objective is to understand how LAGO's distinct computational workloads-characterized by a prevalent coarse-grained, task-parallel execution model-consume resources in practice. To achieve this, we analyze historical job accounting data from the EGI FedCloud platform, identifying primary workload categories (Monte Carlo simulations, data processing, user analysis/testing) and evaluating their performance using key efficiency metrics (CPU utilization, walltime utilization, and I/O patterns). Our analysis reveals significant patterns, including high CPU efficiency within individual simulation tasks contrasted with the distorting impact of short test jobs on aggregate metrics. This work pinpoints specific inefficiencies and provides data-driven insights into LAGO's HPC usage. The findings directly inform recommendations for optimizing resource requests, refining workflow management strategies, and guiding future efforts to enhance computational throughput, ultimately maximizing the scientific return from LAGO's HPC investments.
翻译:拉丁美洲巨型天文台(LAGO)项目依赖大规模高性能计算(HPC)资源进行复杂的宇宙线物理模拟,资源效率对科研产出与项目可持续性至关重要。本文针对LAGO计算环境,开展量化与提升HPC资源利用效率的详细分析。核心目标是理解LAGO特有的计算负载(以粗粒度任务并行执行模式为主)在实际运行中的资源消耗特征。为此,我们分析了EGI FedCloud平台的历史作业记账数据,识别出主要负载类别(蒙特卡洛模拟、数据处理、用户分析/测试),并采用关键效率指标(CPU利用率、挂墙时间利用率及I/O模式)评估其性能。分析揭示了若干重要规律,包括单个模拟任务内的高CPU效率与短测试作业对整体指标的扭曲效应。本研究精准定位了具体低效环节,为LAGO的HPC使用模式提供了数据驱动的见解。研究成果可直接转化为优化资源请求、改进工作流管理策略的实践建议,并为未来提升计算吞吐量、最终实现LAGO的HPC投资效益最大化提供指导。