With the rapid increase of published open datasets, it is crucial to support the open data progress in smart cities while considering the open data quality. In the Czech Republic, and its National Open Data Catalogue (NODC), the open datasets are usually evaluated based on their metadata only, while leaving the content and the adherence to the recommended data structure to the sole responsibility of the data providers. The interoperability of open datasets remains unknown. This paper therefore aims to propose a novel content-aware quality evaluation framework that assesses the quality of open datasets based on five data quality dimensions. With the proposed framework, we provide a fundamental view on the interoperability-oriented data quality of Czech open datasets, which are published in NODC. Our evaluations find that domain-specific open data quality assessments are able to detect data quality issues beyond traditional heuristics used for determining Czech open data quality, increase their interoperability, and thus increase their potential to bring value for the society. The findings of this research are beneficial not only for the case of the Czech Republic, but also can be applied in other countries that intend to enhance their open data quality evaluation processes.
翻译:随着公开数据集的快速增长,在智慧城市中推进开放数据进展的同时,考虑开放数据质量至关重要。在捷克共和国及其国家开放数据目录(NODC)中,开放数据集通常仅基于元数据进行评估,而将内容及其对推荐数据结构的遵循情况完全交由数据提供者自行负责。开放数据集的互操作性仍属未知。因此,本文旨在提出一种新颖的、感知内容的质量评估框架,该框架基于五个数据质量维度评估开放数据集的质量。借助所提出的框架,我们提供了对发布在NODC中的捷克开放数据集在互操作性导向的数据质量方面的基本视角。我们的评估发现,特定领域的开放数据质量评估能够检测到超越用于确定捷克开放数据质量的传统启发式方法之外的数据质量问题,提高其互操作性,从而增加其为社会创造价值的潜力。本研究的发现不仅对捷克共和国的案例有益,也可应用于其他旨在改进其开放数据质量评估流程的国家。