Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review

Knowledge graphs (KGs) are widely used for representing and organizing structured knowledge in diverse domains. However, the creation and upkeep of KGs pose substantial challenges. Developing a KG demands extensive expertise in data modeling, ontology design, and data curation. Furthermore, KGs are dynamic, requiring continuous updates and quality control to ensure accuracy and relevance. These intricacies contribute to the considerable effort required for their development and maintenance. One critical dimension of KGs that warrants attention is reproducibility. The ability to replicate and validate KGs is fundamental for ensuring the trustworthiness and sustainability of the knowledge they represent. Reproducible KGs not only support open science by allowing others to build upon existing knowledge but also enhance transparency and reliability in disseminating information. Despite the growing number of domain-specific KGs, a comprehensive analysis concerning their reproducibility has been lacking. This paper addresses this gap by offering a general overview of domain-specific KGs and comparing them based on various reproducibility criteria. Our study over 19 different domains shows only eight out of 250 domain-specific KGs (3.2%) provide publicly available source code. Among these, only one system could successfully pass our reproducibility assessment (14.3%). These findings highlight the challenges and gaps in achieving reproducibility across domain-specific KGs. Our finding that only 0.4% of published domain-specific KGs are reproducible shows a clear need for further research and a shift in cultural practices.

翻译：知识图谱（KGs）广泛应用于不同领域中结构化知识的表示与组织。然而，知识图谱的构建与维护面临重大挑战。开发知识图谱需要数据建模、本体设计与数据管理方面的深厚专业知识。此外，知识图谱具有动态性，需要持续更新与质量控制以确保准确性与相关性。这些复杂性导致知识图谱的开发与维护需要大量努力。知识图谱的一个关键维度值得关注——可重现性。复制与验证知识图谱的能力对于确保其知识表示的可靠性与可持续性至关重要。可重现的知识图谱不仅通过允许他人基于现有知识进行构建来支持开放科学，还能增强信息传播中的透明度与可靠性。尽管领域特定知识图谱的数量不断增长，但对其可重现性的全面分析尚付阙如。本文通过概述领域特定知识图谱并基于多种可重现性标准进行对比分析来填补这一空白。我们对19个不同领域的研究表明，在250个领域特定知识图谱中，仅有8个（3.2%）提供了公开可获取的源代码。在这些系统中，仅有一个（14.3%）成功通过了我们的可重现性评估。这些发现突显了实现领域特定知识图谱可重现性所面临的挑战与差距。我们的研究结果表明，仅有0.4%已发表的领域特定知识图谱是可重现的，这明确显示了进一步研究与文化实践变革的必要性。