Towards High-Value Datasets determination for data-driven development: a systematic literature review

The OGD is seen as a political and socio-economic phenomenon that promises to promote civic engagement and stimulate public sector innovations in various areas of public life. To bring the expected benefits, data must be reused and transformed into value-added products or services. This, in turn, sets another precondition for data that are expected to not only be available and comply with open data principles, but also be of value, i.e., of interest for reuse by the end-user. This refers to the notion of 'high-value dataset' (HVD), recognized by the European Data Portal as a key trend in the OGD area in 2022. While there is a progress in this direction, e.g., the Open Data Directive, incl. identifying 6 key categories, a list of HVDs and arrangements for their publication and re-use, they can be seen as 'core' / 'base' datasets aimed at increasing interoperability of public sector data with a high priority, contributing to the development of a more mature OGD initiative. Depending on the specifics of a region and country - geographical location, social, environmental, economic issues, cultural characteristics, (under)developed sectors and market specificities, more datasets can be recognized as of high value for a particular country. However, there is no standardized approach to assist chief data officers in this. In this paper, we present a systematic review of existing literature on the HVD determination, which is expected to form an initial knowledge base for this process, incl. used approaches and indicators to determine them, data, stakeholders.

翻译：开放政府数据（OGD）被视为一种政治与社会经济现象，旨在促进公民参与并推动公共生活各领域的公共部门创新。为实现预期效益，数据必须被重复使用并转化为增值产品或服务。这进而对数据提出了另一前提条件：数据不仅要可用且符合开放数据原则，还须具备价值，即对终端用户具有重复使用的兴趣。这涉及"高价值数据集"（HVD）的概念——欧洲数据门户将其视为2022年OGD领域的关键趋势。尽管该方向已取得进展，例如《开放数据指令》明确了6个关键类别、HVD清单及其发布与重复使用安排，但这些数据集可被视为"核心/基础"数据集，旨在以高优先级提升公共部门数据的互操作性，助力更成熟的OGD倡议发展。根据区域和国家具体情境——地理位置、社会、环境与经济问题、文化特征、发达/欠发达行业及市场特性——更多数据集可被认定为特定国家的高价值数据。然而，目前尚无标准化方法辅助首席数据官完成此工作。本文对现有HVD确定相关文献进行系统性综述，旨在形成该过程的初始知识库，涵盖所采用的方法与指标、数据及利益相关方。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日