With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. Applying a GPU is indispensable for parallel computing; however, the complex data dependencies and resource contention across kernels within a GPU task may unpredictably delay its execution time. To address these problems, this paper presents a scheduling and analysis method for Directed Acyclic Graph (DAG)-structured GPU tasks. Given a DAG representation, the proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time. The corresponding timing analysis yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities. The proposed method is implemented using the standard CUDA API, requiring no additional software or hardware support. Experimental results under synthetic and real-world benchmarks demonstrate that the proposed approach effectively reduces the worst-case makespan and measured task execution time compared to the existing methods up to 32.8% and 21.3%, respectively.
翻译:随着人工智能的飞速发展,图形处理器(GPU)在日益增多的安全关键应用领域中变得愈发不可或缺。使用GPU进行并行计算是必不可少的;然而,GPU任务内部各内核之间复杂的数据依赖关系和资源竞争可能会不可预测地延迟其执行时间。为解决这些问题,本文提出了一种面向有向无环图(DAG)结构GPU任务的调度与分析方法。给定DAG表示,所提出的调度方法可扩展内核级并行性,并建立内核间依赖关系,从而提供一个缩减且可预测的DAG响应时间。相应的时序分析在无需对内核优先级做任何假设的前提下,得出了一个安全且非悲观的最长完工时间界限。该方法使用标准CUDA API实现,无需额外的软件或硬件支持。在合成与真实基准测试下的实验结果表明,与现有方法相比,所提方法能有效降低最坏情况最长完工时间和实测任务执行时间,降幅分别高达32.8%和21.3%。