Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose two novel techniques, namely the kernel thread and IOCTL-based approaches, to enable preemptive priority-based scheduling for real-time GPU tasks. Our approaches exert control over GPU context scheduling at the device driver level and enable preemptive GPU scheduling based on task priorities. The kernel thread-based approach achieves this without requiring modifications to user-level programs, while the IOCTL-based approach needs only a single macro at the boundaries of GPU access segments. In addition, we provide a comprehensive response time analysis that takes into account overlaps between different task segments, mitigating pessimism in worst-case estimates. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and timeliness of real-time tasks. The results highlight significant improvements over prior work, with up to 40\% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
翻译:调度使用GPU并具有可分析保证的实时任务是一项重大挑战,这源于CPU与GPU资源之间的复杂交互,以及GPU硬件和软件栈的复杂性。尽管实时研究领域已进行了大量研究,但仍存在若干局限性,包括抢占的缺失或受限、阻塞时间延长,以及/或需要对程序代码进行大量修改。在本文中,我们提出了两种新颖技术,即基于内核线程和基于IOCTL的方法,以实现实时GPU任务的抢占式优先级调度。我们的方法在设备驱动程序层面对GPU上下文调度进行控制,并根据任务优先级实现抢占式GPU调度。基于内核线程的方法无需修改用户级程序即可实现此目标,而基于IOCTL的方法仅需在GPU访问段的边界处添加一个宏。此外,我们提供了一种全面的响应时间分析,该分析考虑了不同任务段之间的重叠,从而减轻了最坏情况估计中的悲观性。通过经验评估和案例研究,我们展示了所提方法在提高任务集可调度性和实时任务及时性方面的有效性。结果表明,与先前工作相比,可调度性提升了高达40%,同时在Nvidia Jetson嵌入式平台上实现了可预测的最坏情况行为。