Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations.
翻译:优化由图形处理器(GPU)加速的计算流体动力学(CFD)应用程序的性能对于高效模拟至关重要。本研究采用基于机器学习的自动调优技术,优化了与GPU内核调度相关的14个关键参数,包括线程块数量及块内线程数。该方法以全连接神经网络作为底层机器学习模型,将调优参数作为神经网络输入,模拟实际执行时间作为输出。为评估自动调优方法的有效性,我们在计算速度从低到高的三种不同类型GPU上进行了实验,分别针对每个GPU模型进行独立训练,并探索了跨多个GPU模型的联合训练。通过利用人工神经网络,该自动调优技术在大范围参数调优中取得了显著成果,显著提升了CFD代码的性能。更重要的是,该方法仅需从庞大的参数搜索空间中采样少量数据即可展现有效性,这归功于全连接神经网络在捕捉参数设置与性能之间复杂关系方面的能力。总体而言,本研究展示了机器学习(特别是全连接神经网络)在GPU加速CFD代码自动调优中的潜力,通过该方法,研究人员与实践者能够以优化的参数配置实现科学模拟的高性能。