Data Quality (DQ) describes the degree to which data characteristics meet requirements and are fit for use by humans and/or systems. There are several aspects in which DQ can be measured, called DQ dimensions (i.e. accuracy, completeness, consistency, etc.), also referred to as characteristics in literature. ISO/IEC 25012 Standard defines a data quality model with fifteen such dimensions, setting the requirements a data product should meet. In this short report, we aim to bridge the gap between lower-level functionalities offered by DQ tools and higher-level dimensions in a systematic manner, revealing the many-to-many relationships between them. To this end, we examine 6 open-source DQ tools and we emphasize on providing a mapping between the functionalities they offer and the DQ dimensions, as defined by the ISO standard. Wherever applicable, we also provide insights into the software engineering details that tools leverage, in order to address DQ challenges.
翻译:数据质量(DQ)描述了数据特征满足需求并适用于人类和/或系统使用的程度。衡量数据质量可从多个维度展开,这些维度(如准确性、完整性、一致性等)在文献中亦被称为数据特征。ISO/IEC 25012标准定义了一个包含十五个维度的数据质量模型,确立了数据产品应满足的要求。本短篇报告旨在以系统化方式弥合数据质量工具提供的底层功能与高层维度之间的鸿沟,揭示二者间的多对多映射关系。为此,我们考察了6款开源数据质量工具,重点构建了这些工具提供的功能与ISO标准定义的数据质量维度之间的对应关系。在适用情况下,我们还深入探讨了工具为应对数据质量挑战所采用的软件工程实现细节。