In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph and supports graph analytics over Lakehouse tables using GSQL. To minimize startup time, it loads only the graph topology. Furthermore, it introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives. Extensive experiments demonstrate that GraphLake significantly outperforms PuppyGraph, the current state-of-the-art graph compute engine for Lakehouse, by achieving both lower startup and query time.
翻译:本文介绍GraphLake,一个专为湖仓架构设计的图计算引擎。GraphLake构建于商用图数据库TigerGraph之上,它将湖仓中的表映射为标签属性图中的顶点与边类型,并支持通过GSQL对湖仓表进行图分析计算。为最小化启动时间,该系统仅加载图拓扑结构。此外,GraphLake引入了一系列技术来保障对湖仓表的高效查询,包括图感知缓存机制与两种面向湖仓优化的并行原语。大量实验表明,GraphLake在启动时间和查询时间上均显著优于当前湖仓图计算引擎的最先进方案PuppyGraph。