Tabular data plays an essential role in many data analytics and machine learning tasks. Typically, tabular data does not possess any machine-readable semantics. In this context, semantic table interpretation is crucial for making data analytics workflows more robust and explainable. This article proposes Tab2KG - a novel method that targets at the interpretation of tables with previously unseen data and automatically infers their semantics to transform them into semantic data graphs. We introduce original lightweight semantic profiles that enrich a domain ontology's concepts and relations and represent domain and table characteristics. We propose a one-shot learning approach that relies on these profiles to map a tabular dataset containing previously unseen instances to a domain ontology. In contrast to the existing semantic table interpretation approaches, Tab2KG relies on the semantic profiles only and does not require any instance lookup. This property makes Tab2KG particularly suitable in the data analytics context, in which data tables typically contain new instances. Our experimental evaluation on several real-world datasets from different application domains demonstrates that Tab2KG outperforms state-of-the-art semantic table interpretation baselines.
翻译:表格数据在许多数据分析和机器学习任务中发挥着重要作用。通常,表格数据不具备机器可读的语义信息。在此背景下,语义表格理解对于提升数据分析工作流的鲁棒性和可解释性至关重要。本文提出Tab2KG——一种旨在解释包含未见数据表格的新方法,能够自动推断其语义并将其转换为语义数据图。我们引入了原创的轻量级语义轮廓,用于丰富领域本体中的概念和关系,并表征领域与表格特征。我们提出一种基于这些轮廓的单样本学习方法,将包含未见实例的表格数据集映射至领域本体。与现有语义表格理解方法不同,Tab2KG仅依赖语义轮廓,无需进行实例查找。这一特性使Tab2KG特别适用于数据分析场景——其中数据表格通常包含新实例。我们在来自不同应用领域的多个真实数据集上的实验评估表明,Tab2KG的性能优于现有最先进的语义表格理解基线方法。