WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

Recent CUDA exploitation work shows that GPU memory bugs can escalate into device-side control-flow corruption, as kernels later consume corrupted return continuations, function pointers, dispatch-table entries, or branch targets. For deployed CUDA binaries, the relevant security boundary is executed NVIDIA SASS, after PTX lowering, inlining, ABI decisions, register allocation, spills, predication, and SIMT execution; source- or PTX-level policies do not capture this boundary. We present WarpGuard, to our knowledge the first protected-site CFI system for CUDA device binaries operating on executed SASS. WarpGuard enforces at protected sites: recovered SASS instructions or sequences that consume control-flow state, provide sufficient binary evidence to derive policy, are checked before release, and fail closed on violation. It authenticates backward-edge continuation state for instrumented returns, validates recoverable forward targets per site, and reports fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes outside the protected denominator. On 77 CUDA artifacts, WarpGuard classifies 51,621 SASS control-flow sites, including 1,343 returns and 154 supported forward target-set entries, and records 52.2 million dynamic checks. In representative backward- and forward-edge corruption attacks, native execution reaches attacker-selected behavior, detect-only mode records the expected violation, and enforcement fails closed before releasing the invalid protected transfer. Public-code evidence shows that the same SASS consumption patterns occur in real CUDA systems, including runtime dispatch tables, cuFFT callbacks, generated callable tables, and uploaded device-function pointers. WarpGuard delivers auditable protected-site CFI for CUDA SASS and separates dynamic-instrumentation enforcement from callback-free SASS timing and patch-cache feasibility.

翻译：近期CUDA漏洞利用研究表明，GPU内存错误可升级为设备端控制流损坏——当后续核函数消耗被破坏的返回延续、函数指针、调度表条目或分支目标时。对于已部署的CUDA二进制程序，相关安全边界是经过PTX降级、内联、ABI决策、寄存器分配、溢出、预测执行及SIMT执行后的NVIDIA SASS指令集；源代码级或PTX级策略无法覆盖该边界。我们提出WarpGuard——据我们所知，这是首个在已执行SASS层面为CUDA设备二进制程序提供关键点CFI保护的系统。WarpGuard在关键点处实施以下机制：恢复消耗控制流状态的SASS指令或指令序列，提供足够二进制证据以推导策略，在释放前完成检查，并在此类检查失败时执行故障闭锁。它通过插桩后的返回指令验证后向边缘延续状态，针对每个关键点验证可恢复的前向目标，并将固定边缘、不支持的、通过配置文件排除的、回退及无表面结果报告为未受保护场景。在77个CUDA程序中，WarpGuard分类了51,621个SASS控制流关键点（包含1,343个返回指令和154个受支持的前向目标集条目），并记录了5,220万次动态检查。在典型后向与前向边缘破坏攻击实验中：原生执行抵达攻击者预设行为，仅检测模式记录预期违规事件，而实施保护时系统在释放无效受保护传输前完成故障闭锁。公开代码证据表明，真实CUDA系统（包括运行时调度表、cuFFT回调函数、生成的调用表及上传的设备函数指针）中存在相同的SASS消耗模式。WarpGuard为CUDA SASS提供了可审计的关键点控制流完整性保护，并将动态插桩的执行与无回调SASS时序及补丁缓存可行性相分离。