Graph-centric cross-model data integration and analytics (GCDIA) refer to tasks that leverage the graph model as a central paradigm to integrate relevant information across heterogeneous data models, such as relational and document, and subsequently perform complex analytics such as regression and similarity computation. As modern applications generate increasingly diverse data and move beyond simple retrieval toward advanced analytical objectives (e.g., prediction and recommendation), GCDIA has become increasingly important. Existing multi-model databases (MMDBs) struggle to efficiently support both integration (GCDI) and analytics (GCDA) in GCDIA. They typically separate graph processing from other models without global optimization for GCDI, while relying on tuple-at-a-time execution for GCDA, leading to limited performance and scalability. To address these limitations, we propose GredoDB, a unified MMDB that natively supports storing graph, relational, and document models, while efficiently processing GCDIA. Specifically, we design 1) topology- and attribute-aware graph operators for efficient predicate-aware traversal, 2) a unified GCDI optimization framework to exploit cross-model correlations, and 3) a parallel GCDA architecture that materializes intermediate results for operator-level execution. Experiments on the widely adopted multi-model benchmark M2Bench demonstrate that, in terms of response time, GredoDB achieves up to 107.89 times and an average of 10.89 times speedup on GCDI, and up to 356.72 times and an average of 37.79 times on GCDA, compared to state-of-the-art (SOTA) MMDBs.
翻译:以图为中心的跨模型数据集成与分析(GCDIA)是指以图模型为核心范式,集成来自异构数据模型(如关系模型和文档模型)的相关信息,进而执行复杂分析(如回归和相似度计算)的任务。随着现代应用产生的数据类型日益多样,且需求从简单检索转向高级分析目标(如预测与推荐),GCDIA 的重要性与日俱增。现有的多模型数据库(MMDB)在同时高效支持 GCDIA 中的集成(GCDI)与分析(GCDA)方面面临挑战。这些数据库通常将图处理与其他模型相分离,缺乏针对 GCDI 的全局优化;同时,它们依赖逐元组执行方式处理 GCDA,导致性能和可扩展性受限。为解决上述问题,我们提出了 GredoDB——一种统一的多模型数据库,原生支持图、关系和文档模型的存储,并能高效处理 GCDIA。具体而言,我们设计了:1)拓扑感知与属性感知的图算子,用于高效执行谓词感知的遍历操作;2)统一的 GCDI 优化框架,以利用跨模型关联;3)并行 GCDA 架构,通过物化中间结果实现算子级执行。在广泛采用的多模型基准测试 M2Bench 上的实验表明,在响应时间方面,与现有最优(SOTA)多模型数据库相比,GredoDB 在 GCDI 上实现了最高 107.89 倍、平均 10.89 倍的加速,在 GCDA 上实现了最高 356.72 倍、平均 37.79 倍的加速。