Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with the potential for massive data parallelism. While traditional engines are CPU-based, the memory-bound nature of Datalog has led to increasing interest in leveraging GPUs. These engines beat CPU-based engines by operationalizing iterated relational joins via SIMT-friendly join algorithms. Unfortunately, all existing GPU Datalog engines are built on binary joins, which are inadequate for the complex multi-way queries arising in production systems such as DOOP and ddisasm. For these queries, binary decomposition can incur the AGM bound asymptotic blowup in time and space, leading to OOM failures regardless of join order. Worst-Case Optimal Joins (WCOJ) avoid this blowup, but their attribute-at-a-time intersections map poorly to SIMT hardware under key skew, causing severe load imbalance across Streaming Multiprocessors (SMs). We present SRDatalog, the first GPU Datalog engine based on WCOJ. SRDatalog uses flat columnar storage and two-phase deterministic memory allocation to avoid the OOM failures of binary joins and the index-rebuild overheads of static WCOJ systems. To mitigate skew and hide hardware stalls, SRDatalog further employs root-level histogram-guided load balancing, structural helper-relation splitting, and stream-aligned rule multiplexing. On real-world program-analysis workloads, SRDatalog achieves geometric-mean speedups of 21x to 47x.
翻译:Datalog是一种声明性逻辑编程语言,用于程序分析和图分析等复杂分析推理任务。Datalog的流行源于其独特的性价比:它将逻辑定义的规范与大规模数据并行性的潜力相结合。传统引擎基于CPU,但Datalog的内存受限特性促使人们日益关注GPU的使用。这些引擎通过利用SIMT友好的连接算法实现迭代关系连接,从而击败了基于CPU的引擎。遗憾的是,所有现有的GPU Datalog引擎都基于二元连接,这不足以处理DOOP和ddisasm等生产系统中出现的复杂多路查询。对于这些查询,二元分解在时间和空间上可能产生AGM界渐近膨胀,导致无论连接顺序如何都会出现内存溢出(OOM)故障。最坏情况最优连接(WCOJ)避免了这种膨胀,但其逐属性交集在关键偏斜下难以映射到SIMT硬件,导致流多处理器(SM)间严重的负载不均衡。我们提出了SRDatalog,这是首个基于WCOJ的GPU Datalog引擎。SRDatalog采用扁平列式存储和两阶段确定性内存分配,以避免二元连接的OOM故障和静态WCOJ系统的索引重建开销。为缓解偏斜并隐藏硬件停滞,SRDatalog进一步采用基于根直方图的负载均衡、结构性辅助关系拆分和流对齐规则多路复用。在真实世界的程序分析工作负载上,SRDatalog实现了21倍至47倍的几何平均加速比。