Modern GPU applications, such as machine learning (ML) frameworks, can only partially utilize beefy GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different users can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space (GPU context). Previous GPU memory protection approaches have limited deployability because they require specialized hardware extensions or access to source code. This is often unavailable in GPU-accelerated libraries heavily utilized by ML frameworks. In this paper, we present G-Safe, a PTX-level bounds checking approach for GPUs that limits GPU kernels of each application to stay within the memory partition allocated to them. G-Safe relies on three mechanisms: (1) It divides the common GPU address space into separate partitions for different applications. (2) It intercepts and checks data transfers, fencing erroneous operations. (3) It instruments all GPU kernels at the PTX level (available in closed GPU libraries) fencing all kernel memory accesses outside application memory bounds. We implement G-Safe as an external, dynamically linked library that can be pre-loaded at application startup time. G-Safe's approach is transparent to applications and can support real-life, complex frameworks, such as Caffe and PyTorch, that issue billions of GPU kernels. Our evaluation shows that the overhead of G-Safe compared to native (unprotected) for such frameworks is between 4\% - 12\% and on average 9\%.
翻译:现代GPU应用(如机器学习框架)仅能部分利用高性能GPU,导致云环境中GPU利用率不足。跨不同用户的多个应用共享GPU可提升资源利用率,进而提高成本、能耗和功率效率。然而,GPU共享会引发内存安全问题,因为内核必须共享单个GPU地址空间(GPU上下文)。先前的GPU内存保护方法因需要专用硬件扩展或访问源代码而部署性受限,这在机器学习框架大量使用的GPU加速库中通常不可用。本文提出G-Safe——一种针对GPU的PTX级边界检查方法,可限制每个应用的GPU内核仅在其分配的内存分区内运行。G-Safe依赖三种机制:(1)将公共GPU地址空间划分为不同应用各自的分区;(2)拦截并检查数据传输,阻断错误操作;(3)在PTX层级(适用于闭源GPU库)对所有GPU内核进行插桩,将所有超出应用内存边界的内核内存访问操作隔离。我们将G-Safe实现为外部动态链接库,可在应用启动时预加载。该方案对应用透明,且支持Caffe和PyTorch等发出数十亿GPU内核的真实复杂框架。评估表明,与原生(无保护)方案相比,G-Safe对此类框架的开销为4%~12%,平均9%。