Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with the potential for massive data parallelism. While traditional engines are CPU-based, the memory-bound nature of Datalog has led to increasing interest in leveraging GPUs. These engines beat CPU-based engines by operationalizing iterated relational joins via SIMT-friendly join algorithms. Unfortunately, all existing GPU Datalog engines are built on binary joins, which are inadequate for the complex multi-way queries arising in production systems such as DOOP and ddisasm. For these queries, binary decomposition can incur the AGM bound asymptotic blowup in time and space, leading to OOM failures regardless of join order. Worst-Case Optimal Joins (WCOJ) avoid this blowup, but their attribute-at-a-time intersections map poorly to SIMT hardware under key skew, causing severe load imbalance across Streaming Multiprocessors (SMs). We present SRDatalog, the first GPU Datalog engine based on WCOJ. SRDatalog uses flat columnar storage and two-phase deterministic memory allocation to avoid the OOM failures of binary joins and the index-rebuild overheads of static WCOJ systems. To mitigate skew and hide hardware stalls, SRDatalog further employs root-level histogram-guided load balancing, structural helper-relation splitting, and stream-aligned rule multiplexing. On real-world program-analysis workloads, SRDatalog achieves geometric-mean speedups of 21x to 47x.
翻译:Datalog是一种声明式逻辑编程语言,用于程序分析和图分析等复杂分析推理工作负载。Datalog的流行源于其独特的价值定位,即结合了逻辑定义的规范性与大规模数据并行处理的潜力。尽管传统引擎基于CPU,但Datalog受内存约束的特性使其越来越受益于GPU的采用。这些引擎通过采用SIMT友好的连接算法实现迭代关系连接,从而在性能上超越CPU引擎。然而,现有所有GPU Datalog引擎均基于二元连接构建,这无法应对生产系统(如DOOP和ddisasm)中出现的复杂多路查询。对于此类查询,二元分解可能导致时间与空间上的AGM界渐近膨胀,无论连接顺序如何,都会引发内存溢出故障。最坏情况最优连接可避免这种膨胀,但其逐属性交集操作在键倾斜条件下难以高效映射到SIMT硬件,导致流多处理器间严重的负载不均衡。我们提出SRDatalog,首个基于最坏情况最优连接的GPU Datalog引擎。SRDatalog采用扁平列式存储与两阶段确定性内存分配,以规避二元连接的内存溢出故障及静态最坏情况最优连接系统的索引重建开销。为缓解倾斜并隐藏硬件停顿,SRDatalog进一步采用根级直方图引导的负载均衡、结构辅助关系拆分和流对齐规则复用。在真实程序分析工作负载上,SRDatalog实现了21倍至47倍的几何平均加速比。