AgileOS: A GPU Operating System Layer for Protected CUDA Services

Modern GPU applications increasingly interact with storage systems, network devices, vendor libraries, and GPU-resident services rather than executing only isolated compute kernels. This shift creates a need for operating-system-like protection around GPU services, where service metadata, device queues, memory-mapped I/O regions, and library-internal state should not be directly exposed to untrusted application kernels. However, today's CUDA programming model, by default, still gives each application direct ownership of its CUDA context, device pointers, runtime handles, module loading path, and kernel launches, leaving protected GPU services to build their own ad hoc interfaces and isolation mechanisms. This paper presents the initial design and prototype scope of AgileOS, a GPU operating-system layer for protected CUDA services. AgileOS virtualizes CUDA at the library boundary: applications link against client-side CUDA Runtime, Driver, and selected library shims, while a trusted runtime worker owns the real CUDA context and mediates supported operations. To protect service state and module interfaces, AgileOS also defines a GPU memory-management model that separates user allocations from protected module/MMIO ranges, using pointer validation and memory access guards via PTX injection. AgileOS is modularized and flexible, supporting a range of protected services and existing libraries such as cuFFT and PyTorch. The prototype includes client-side interceptors, worker-side CUDA handlers, virtualized CUDA object tables, protected AgileOS modules, a GPU memory manager that separates user allocations from protected module/MMIO ranges, selected trusted library adapters, and the PTX-level kernel memory guard.

翻译：现代GPU应用越来越多地与存储系统、网络设备、厂商库以及GPU常驻服务进行交互，而不再仅仅执行孤立的计算内核。这种转变要求围绕GPU服务提供类似操作系统的保护机制，即服务元数据、设备队列、内存映射I/O区域以及库内部状态不应直接暴露给不可信的应用内核。然而，当前CUDA编程模型默认情况下仍允许每个应用直接拥有其CUDA上下文、设备指针、运行时句柄、模块加载路径和内核启动权限，这使得受保护的GPU服务不得不自行构建临时接口和隔离机制。本文提出AgileOS的初步设计与原型范围——作为一个面向受保护CUDA服务的GPU操作系统层。AgileOS在库边界实现CUDA虚拟化：应用链接至客户端CUDA运行时、驱动及选定的库填充层，而受信运行时工作者拥有真实CUDA上下文并调控受支持的操作。为保护服务状态与模块接口，AgileOS还定义了GPU内存管理模型，通过指针验证和基于PTX注入的内存访问防护，将用户分配区域与受保护模块/MMIO范围分离。AgileOS采用模块化灵活设计，支持包括cuFFT和PyTorch在内的多种受保护服务及现有库。原型系统包含客户端拦截器、工作者端CUDA处理器、虚拟化CUDA对象表、受保护AgileOS模块、分离用户分配与受保护模块/MMIO范围的GPU内存管理器、选定的受信库适配器以及PTX级内核内存防护机制。