RDF streaming has been explored by the Semantic Web community from many angles, resulting in multiple task formulations and streaming methods. However, for many existing formulations of the problem, reliably benchmarking streaming solutions has been challenging due to the lack of well-described and appropriately diverse benchmark datasets. Existing datasets and evaluations, except a few notable cases, suffer from unclear streaming task scopes, underspecified benchmarks, and errors in the data. To address these issues, we propose RiverBench, an open and collaborative RDF streaming benchmark suite. RiverBench leverages continuous, community-driven processes, established best practices (e.g., FAIR), and built-in quality guarantees. The suite distributes datasets in a common, accessible format, with clear documentation, licensing, and machine-readable metadata. The current release includes a diverse collection of non-synthetic datasets generated by the Semantic Web community, representing many applications of RDF data streaming, all major task formulations, and emerging RDF features (RDF-star). Finally, we present a list of research applications for the suite, demonstrating its versatility and value even beyond the realm of RDF streaming.
翻译:语义网社区从多个角度对RDF流处理进行了探索,形成了多种任务定义和流处理方法。然而,对于许多现有问题定义而言,由于缺乏描述充分且多样性适当的基准数据集,对流处理解决方案进行可靠基准测试一直颇具挑战。除少数显著案例外,现有数据集和评估存在流处理任务范围不明确、基准规范不完整以及数据错误等问题。为解决这些问题,我们提出RiverBench——一个开放且协作的RDF流处理基准测试套件。RiverBench采用持续性的社区驱动流程、成熟的最佳实践(如FAIR原则)以及内置质量保障机制。该套件以通用、可访问的格式分发数据集,附带清晰的文档、许可证和机器可读元数据。当前版本包含由语义网社区生成的多样化非合成数据集,覆盖了RDF数据流处理的多种应用场景、所有主要任务定义以及新兴RDF特性(RDF-star)。最后,我们列出了该套件的研究应用案例,展示了其超越RDF流处理领域的多功能性与价值。