This paper introduces RDFGraphGen, a general-purpose, domain-independent generator of synthetic RDF graphs based on SHACL constraints. The Shapes Constraint Language (SHACL) is a W3C standard which specifies ways to validate data in RDF graphs, by defining constraining shapes. However, even though the main purpose of SHACL is validation of existing RDF data, in order to solve the problem with the lack of available RDF datasets in multiple RDF-based application development processes, we envisioned and implemented a reverse role for SHACL: we use SHACL shape definitions as a starting point to generate synthetic data for an RDF graph. The generation process involves extracting the constraints from the SHACL shapes, converting the specified constraints into rules, and then generating artificial data for a predefined number of RDF entities, based on these rules. The purpose of RDFGraphGen is the generation of small, medium or large RDF knowledge graphs for the purpose of benchmarking, testing, quality control, training and other similar purposes for applications from the RDF, Linked Data and Semantic Web domain. RDFGraphGen is open-source and is available as a ready-to-use Python package.
翻译:本文介绍RDFGraphGen,一种基于SHACL约束的通用领域无关合成RDF图生成器。形状约束语言(SHACL)作为W3C标准,通过定义约束形状来验证RDF图中的数据。尽管SHACL的主要用途是对现有RDF数据进行验证,但为了解决多个基于RDF的应用开发过程中可用数据集匮乏的问题,我们构想并实现了SHACL的反向功能:以SHACL形状定义为起点生成RDF图的合成数据。该生成过程包括从SHACL形状提取约束条件,将指定约束转换为规则,随后基于这些规则为预定数量的RDF实体生成人造数据。RDFGraphGen旨在生成小型、中型或大型RDF知识图谱,用于RDF、关联数据和语义网领域应用的基准测试、质量管控、训练及其他类似用途。本工具为开源项目,可作为即用型Python软件包获取。