Data races in GPU programs pose a threat to the reliability of GPU-accelerated software stacks. Prior works proposed various dynamic (runtime) and static (compile-time) techniques to detect races in GPU programs. However, dynamic techniques often miss critical races, as they require the races to manifest during testing. While static ones can catch such races, they often generate numerous false alarms by conservatively assuming values of variables/parameters that cannot ever occur during any execution of the program. We make a key observation that the host (CPU) code that launches GPU kernels contains crucial semantic information about the values that the GPU kernel's parameters can take during execution. Harnessing this hitherto overlooked information helps accurately detect data races in GPU kernel code. We create HGRD, a new state-of-the-art static analysis technique that performs a holistic analysis of both CPU and GPU code to accurately detect a broad set of true races while minimizing false alarms. While SOTA dynamic techniques, such as iGUARD, miss many true races, HGRD misses none. On the other hand, static techniques such as GPUVerify and FaialAA raise tens of false alarms, where HGRD raises none.
翻译:GPU程序中的数据竞争对GPU加速软件栈的可靠性构成威胁。已有研究提出了多种动态(运行时)和静态(编译时)技术来检测GPU程序中的数据竞争。然而,动态技术常因依赖竞争在测试中显现而遗漏关键竞争;静态技术虽能检测此类竞争,但往往通过保守假设程序任何执行中都不可能出现的变量/参数值而产生大量误报。我们观察到,启动GPU内核的宿主编码包含了关于内核参数在执行中可取值的关键语义信息。利用这一被忽视的信息有助于精确检测GPU内核代码中的数据竞争。我们提出了HGRD,这是一种新型先进静态分析技术,通过整体分析CPU和GPU代码来精确检测广泛真实竞争,同时最大限度减少误报。SOTA动态技术(如iGUARD)遗漏了许多真实竞争,而HGRD无遗漏。另一方面,静态技术(如GPUVerify和FaialAA)产生数十个误报,而HGRD未产生任何误报。