Effective data processing depends on the quality of the underlying data. However, quality issues such as inconsistencies and uncertainties, can significantly impede the processing and subsequent use of data. Despite the centrality of data quality to a wide range of computational tasks, there is currently no broadly accepted, domain-independent consensus on the definition of data quality. Existing frameworks primarily define data quality in ways that are tailored to specific domains, data types, or contexts of use. Although quality assessment frameworks exist for specific domains, such as electronic health record data and linked data, corresponding approaches for descriptive information about cultural heritage objects remain underdeveloped. Moreover, existing quality definitions are often theoretical in nature and lack empirical validation based on real-world data problems. In this paper, we address these limitations by first defining a set of quality dimensions specifically designed to capture the characteristics of descriptive information about cultural heritage objects. Our definition is based on an in-depth analysis of existing dimensions and is illustrated through domain-specific examples. We then evaluate the practical applicability of our proposed quality definition using a curated set of real-world data quality problems from the cultural heritage domain. This empirical evaluation substantiates our definition of data quality, resulting in a comprehensive definition of data quality in this domain.
翻译:有效的数据处理依赖于基础数据的质量。然而,诸如不一致性和不确定性等质量问题会严重阻碍数据的处理及后续使用。尽管数据质量对广泛的计算任务至关重要,但目前尚未就数据质量的定义达成广泛接受、独立于领域的共识。现有框架主要以针对特定领域、数据类型或使用情境的方式定义数据质量。尽管存在针对特定领域的质量评估框架,例如电子健康记录数据和关联数据,但针对文化遗产对象描述信息的相应方法仍不完善。此外,现有的质量定义往往本质上是理论性的,缺乏基于现实世界数据问题的实证验证。在本文中,我们首先通过定义一组专门设计用于捕捉文化遗产对象描述信息特征的质量维度来应对这些局限性。我们的定义基于对现有维度的深入分析,并通过领域特定示例进行阐释。随后,我们使用来自文化遗产领域的一组精心筛选的真实世界数据质量问题,评估了我们所提出的质量定义的实际适用性。这一实证评估证实了我们的数据质量定义,从而得出了该领域数据质量的全面定义。