Knowledge Graph (KG) processing faces critical infrastructure challenges in selecting optimal NoSQL database paradigms, as traditional performance evaluations rely on static benchmarks that fail to capture the complexity of real-world KG workloads. Although the big data field offers numerous comparative studies, in the KG context DBMS selection remains predominantly ad-hoc, leaving practitioners without systematic guidance for matching storage technologies to specific KG characteristics and query requirements. This paper presents a KG-specific benchmarking framework that employs connectivity density, scale, and introduces a graph-centric metric, namely Semantic Richness (SR), within a four-tier query methodology to reveal performance crossover points across Document-Oriented, Graph, and Multi-Model DBMSs. We conduct an empirical evaluation on the FAERS adverse event KG at three scales, comparing paradigms from simple filtering to deep traversal, and provide metric-driven, evidence-based guidelines for aligning NoSQL paradigm selection with graph size, connectivity, and semantic richness.
翻译:知识图谱处理在选择最优NoSQL数据库范式时面临关键基础设施挑战,因为传统性能评估依赖静态基准测试,无法捕捉现实世界知识图谱工作负载的复杂性。尽管大数据领域已有大量比较研究,但在知识图谱场景中,数据库管理系统选择仍主要依赖临时方案,导致从业者缺乏系统化指导来匹配存储技术与特定知识图谱特性及查询需求。本文提出一种知识图谱专用基准测试框架,该框架采用连接密度、规模度量和一种以图为中心的新指标——语义丰富度,通过四层查询方法揭示面向文档型、图型和多模型数据库管理系统之间的性能交叉点。我们在三种规模的FAERS药物不良事件知识图谱上进行实证评估,比较从简单过滤到深度遍历的各类范式,并提供基于度量驱动和证据支持的指导原则,以根据图规模、连接性和语义丰富度对齐NoSQL范式选择。