Interoperability-oriented Quality Assessment for Czech Open Data

With the rapid increase of published open datasets, it is crucial to support the open data progress in smart cities while considering the open data quality. In the Czech Republic, and its National Open Data Catalogue (NODC), the open datasets are usually evaluated based on their metadata only, while leaving the content and the adherence to the recommended data structure to the sole responsibility of the data providers. The interoperability of open datasets remains unknown. This paper therefore aims to propose a novel content-aware quality evaluation framework that assesses the quality of open datasets based on five data quality dimensions. With the proposed framework, we provide a fundamental view on the interoperability-oriented data quality of Czech open datasets, which are published in NODC. Our evaluations find that domain-specific open data quality assessments are able to detect data quality issues beyond traditional heuristics used for determining Czech open data quality, increase their interoperability, and thus increase their potential to bring value for the society. The findings of this research are beneficial not only for the case of the Czech Republic, but also can be applied in other countries that intend to enhance their open data quality evaluation processes.

翻译：随着公开数据集的快速增长，在智慧城市中推进开放数据进展的同时，考虑开放数据质量至关重要。在捷克共和国及其国家开放数据目录（NODC）中，开放数据集通常仅基于元数据进行评估，而将内容及其对推荐数据结构的遵循情况完全交由数据提供者自行负责。开放数据集的互操作性仍属未知。因此，本文旨在提出一种新颖的、感知内容的质量评估框架，该框架基于五个数据质量维度评估开放数据集的质量。借助所提出的框架，我们提供了对发布在NODC中的捷克开放数据集在互操作性导向的数据质量方面的基本视角。我们的评估发现，特定领域的开放数据质量评估能够检测到超越用于确定捷克开放数据质量的传统启发式方法之外的数据质量问题，提高其互操作性，从而增加其为社会创造价值的潜力。本研究的发现不仅对捷克共和国的案例有益，也可应用于其他旨在改进其开放数据质量评估流程的国家。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日