An Open-Source Knowledge Graph Ecosystem for the Life Sciences

Tiffany J. Callahan,Ignacio J. Tripodi,Adrianne L. Stefanski,Luca Cappelletti,Sanya B. Taneja,Jordan M. Wyrwa,Elena Casiraghi,Nicolas A. Matentzoglu,Justin Reese,Jonathan C. Silverstein,Charles Tapley Hoyt,Richard D. Boyce,Scott A. Malec,Deepak R. Unni,Marcin P. Joachimiak,Peter N. Robinson,Christopher J. Mungall,Emanuele Cavalleri,Tommaso Fontana,Giorgio Valentini,Marco Mesiti,Lucas A. Gillenwater,Brook Santangelo,Nicole A. Vasilevsky,Robert Hoehndorf,Tellen D. Bennett,Patrick B. Ryan,George Hripcsak,Michael G. Kahn,Michael Bada,William A. Baumgartner Jr,Lawrence E. Hunter

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

翻译：转化医学需要在多个生物组织尺度上获取数据。测序与多组学技术的进步提高了这类数据的可获取性，但研究人员面临显著的整合挑战。知识图谱被用于模拟复杂现象，现有方法可自动构建知识图谱。然而，解决复杂的生物医学整合问题需要知识建模方式具有灵活性。此外，现有知识图谱构建方法虽提供稳健的工具支持，但代价是知识表征模型的选择固定或受限。PheKnowLator（表型知识转化器）是一个语义生态系统，用于自动化构建具有完全可定制知识表征的、基于本体的FAIR（可发现、可访问、可互操作、可复用）知识图谱。该生态系统包含知识图谱构建资源（如数据制备API）、分析工具（如SPARQL端点与抽象算法）以及基准测试（如预构建知识图谱与嵌入向量）。我们通过系统对比现有开源知识图谱构建方法，并分析其用于构建12个大规模知识图谱时的计算性能，对该生态系统进行了评估。凭借灵活的知识表征，PheKnowLator在性能与可用性不受损的前提下实现了完全可定制的知识图谱。