Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance, making efficient use of different resources available. While extracting optimal performance of applications on an HPC infrastructure, developers should also ensure the applications have the least energy usage considering the massive power consumption of data centres and HPC servers. This thesis presents two models developed which can be utilized by developers in analysing the CUDA kernel's energy dissipation. The first one is a model that predicts the CUDA kernel's execution time. Here a PTX code is statically analysed to extract instruction features, control flow, and data dependence. We propose two scheduling algorithm approaches that satisfy the performance and hardware constraints. The second model is a static analysis-based power prediction built by utilizing machine learning techniques. Features used for building the model are derived using static analysis of PTX code. These features are chosen to understand the relationship between GPU power consumption and program features that can aid developers in building energy-efficient, sustainable applications. The dataset used for validating both models include kernels from different benchmarks suits, sizes, nature (e.g., compute-bound, memory-bound), and complexity (e.g., control divergence, memory access patterns). We also present a tool that has practically validated the effectiveness and ease of using the two models as design assistance tools for GPU.
翻译:图形处理器(GPU)已成为高性能计算实现百亿亿次性能的关键组成部分。GPU应用开发人员的主要目标是通过广泛优化代码以充分利用不同可用资源,从而获得最佳性能。在HPC基础设施上提取应用最佳性能的同时,开发人员还需考虑数据中心和HPC服务器巨大的功耗问题,确保应用具有最低能耗。本文提出了两个可用于分析CUDA内核能耗的模型。第一个模型预测CUDA内核执行时间,通过静态分析PTX代码提取指令特征、控制流和数据依赖关系,并提出了两种满足性能与硬件约束的调度算法。第二个模型是基于静态分析的功耗预测模型,采用机器学习技术构建,其特征通过对PTX代码的静态分析获得。这些特征旨在揭示GPU功耗与程序特征之间的关系,帮助开发人员构建节能可持续的应用。用于验证两个模型的数据集包含来自不同基准测试套件的内核,涵盖不同规模、特性(如计算密集型、内存密集型)及复杂度(如控制分歧、内存访问模式)。我们还开发了一个工具,从实践角度验证了这两个模型作为GPU设计辅助工具的有效性和易用性。