GPU compilers merge all data types into a single unified register file, erasing the type information that binary-analysis tools rely on. We show that type recovery from this untyped register file is the central challenge of GPU binary lifting. We present CuLifter, a SASS-to-LLVM IR lifting framework that recovers register types via constraint propagation with conflict detection, reconstructs explicit control flow, and aggregates multi-instruction patterns. Across eight benchmark suites (24,437 GPU functions in 919 cubins) spanning open-source applications, vendor libraries, and optimized ML runtimes, CuLifter successfully lifts 99.98% of functions to valid LLVM IR. An ablation study confirms that type recovery is the only step required to produce semantically correct IR: disabling it drops the x86 pass rate from 73.8% to 0%, a 73.8 percentage-point drop.
翻译:GPU编译器将所有数据类型合并为单一的统一寄存器文件,从而抹去了二进制分析工具所依赖的类型信息。我们证明,从这种无类型寄存器文件中恢复类型是GPU二进制提升的核心挑战。我们提出CuLifter——一种将SASS提升为LLVM IR的框架,它通过带冲突检测的约束传播恢复寄存器类型,重构显式控制流,并聚合多指令模式。在涵盖开源应用、供应商库和优化机器学习运行时的八个基准测试集(919个cubin中的24,437个GPU函数)上,CuLifter成功将99.98%的函数提升为有效的LLVM IR。消融研究证实,类型恢复是生成语义正确IR所需的唯一步骤:禁用该步骤后,x86通过率从73.8%降至0%,下降了73.8个百分点。