Procedural knowledge describes how to accomplish tasks and mitigate problems. Such knowledge is commonly held by domain experts, e.g. operators in manufacturing who adjust parameters to achieve quality targets. To the best of our knowledge, no real-world datasets containing process data and corresponding procedural knowledge are publicly available, possibly due to corporate apprehensions regarding the loss of knowledge advances. Therefore, we provide a framework to generate synthetic datasets that can be adapted to different domains. The design choices are inspired by two real-world datasets of procedural knowledge we have access to. Apart from containing representations of procedural knowledge in Resource Description Framework (RDF)-compliant knowledge graphs, the framework simulates parametrisation processes and provides consistent process data. We compare established embedding methods on the resulting knowledge graphs, detailing which out-of-the-box methods have the potential to represent procedural knowledge. This provides a baseline which can be used to increase the comparability of future work. Furthermore, we validate the overall characteristics of a synthesised dataset by comparing the results to those achievable on a real-world dataset. The framework and evaluation code, as well as the dataset used in the evaluation, are available open source.
翻译:摘要:程序性知识描述了如何完成任务和解决问题。这类知识通常由领域专家掌握,例如制造业中通过调整参数以达到质量目标的操作人员。据我们所知,目前尚无同时包含流程数据及对应程序性知识的真实世界数据集公开可用,这可能是由于企业担忧知识优势的流失。为此,我们提出一个可适配不同领域的合成数据集生成框架。该框架的设计选择受我们有权访问的两个真实程序性知识数据集启发。除包含符合资源描述框架(RDF)标准的知识图谱中的程序性知识表示外,该框架还模拟参数化过程并提供一致的流程数据。我们在生成的知识图谱上比较了现有嵌入方法,详细阐述了哪些开箱即用方法具有表征程序性知识的潜力,这为提升未来工作的可比性提供了基准。此外,我们通过将合成数据集的结果与真实数据集上的结果进行对比,验证了合成数据集的整体特性。本框架、评估代码及评估所用数据集均已开源。