Graph databases (GDBs) are crucial in academic and industry applications. The key challenges in developing GDBs are achieving high performance, scalability, programmability, and portability. To tackle these challenges, we harness established practices from the HPC landscape to build a system that outperforms all past GDBs presented in the literature by orders of magnitude, for both OLTP and OLAP workloads. For this, we first identify and crystallize performance-critical building blocks in the GDB design, and abstract them into a portable and programmable API specification, called the Graph Database Interface (GDI), inspired by the best practices of MPI. We then use GDI to design a GDB for distributed-memory RDMA architectures. Our implementation harnesses one-sided RDMA communication and collective operations, and it offers architecture-independent theoretical performance guarantees. The resulting design achieves extreme scales of more than a hundred thousand cores. Our work will facilitate the development of next-generation extreme-scale graph databases.
翻译:图数据库(GDB)在学术和工业应用中至关重要。开发GDB的关键挑战在于实现高性能、可扩展性、可编程性和可移植性。为应对这些挑战,我们借鉴高性能计算领域的成熟实践经验,构建了一个在OLTP和OLAP工作负载下性能均超越文献中所有既往GDB数个数量级的系统。为此,我们首先识别并提炼GDB设计中性能关键构建模块,将其抽象为可移植可编程的API规范——图数据库接口(GDI),该设计灵感源自MPI的最优实践。随后,我们利用GDI为分布式内存RDMA架构设计GDB。我们的实现充分利用单边RDMA通信与集合操作,并提供架构无关的理论性能保证。最终设计实现了超过十万核的极端规模扩展。本工作将推动下一代超大规模图数据库的发展。