Data races are egregious parallel programming bugs on CPUs. They are even worse on GPUs due to the hierarchical thread and memory structure, which makes it possible to write code that is correctly synchronized within a thread group while not being correct across groups. Thus far, all major data-race checkers for GPUs suffer from at least one of the following problems: they do not check races in global memory, do not work on recent GPUs, scale poorly, have not been extensively tested, miss simple data races, or are not dependable without detailed knowledge of the compiler. Our new data-race detection tool, HiRace, overcomes these limitations. Its key novelty is an innovative parallel finite-state machine that condenses an arbitrarily long access history into a constant-length state, thus allowing it to handle large and long-running programs. HiRace is a dynamic tool that checks for thread-group shared memory and global device memory races. It utilizes source-code instrumentation, thus avoiding driver, compiler, and hardware dependencies. We evaluate it on a modern calibrated data-race benchmark suite. On the 580 tested CUDA kernels, 346 of which contain data races, HiRace finds races missed by other tools without false alarms and is more than 10 times faster on average than the current state of the art, while incurring only half the memory overhead.
翻译:数据竞争是CPU上严重的并行编程错误。在GPU上,由于层次化线程和内存结构,这一问题更为严峻——它使得编写在线程组内正确同步但跨组不正确的代码成为可能。迄今为止,所有主要的GPU数据竞争检测工具至少存在以下问题之一:未检查全局内存中的竞争、不适用于新型GPU、可扩展性差、未经充分测试、遗漏简单数据竞争,或缺乏编译器详细知识则不可靠。我们的新型数据竞争检测工具HiRace克服了这些局限。其核心创新在于一种创新的并行有限状态机,可将任意长度的访问历史压缩为恒定长度的状态,从而能处理大规模、长时间运行的程序。HiRace是一种动态检测工具,可检查线程组共享内存和全局设备内存竞争。它采用源代码插桩技术,因此避免了对驱动、编译器和硬件的依赖。我们在现代校准数据竞争基准测试套件上对其进行了评估。在测试的580个CUDA内核中(其中346个包含数据竞争),HiRace能发现其他工具遗漏的竞争且无虚警,平均速度比当前最先进技术快10倍以上,而内存开销仅为其一半。