In the contemporary data-driven landscape, ensuring data quality (DQ) is crucial for deriving actionable insights from vast data repositories. The objective of this study is to explore the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. By conducting a systematic review of existing DQ tools available in the market and academic literature, the study assesses their capability to automatically detect and enforce data quality rules. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses. The findings underscore a significant gap in the market and academic research regarding AI-augmented DQ rule detection in data warehouses. This paper advocates for further development in this area to enhance the efficiency of DQ management processes, reduce human workload, and lower costs. The study highlights the necessity of advanced tools for automated DQ rule detection, paving the way for improved practices in data quality management tailored to data warehouse environments. The study can guide organizations in selecting data quality tool that would meet their requirements most.
翻译:在当今数据驱动的背景下,确保数据质量对于从海量数据存储库中获取可操作的洞察至关重要。本研究旨在探索在大型组织常用的数据存储库——数据仓库中实现数据质量管理自动化的潜力。通过对市场上现有数据质量工具及学术文献进行系统性综述,本研究评估了这些工具自动检测和执行数据质量规则的能力。综述涵盖了来自不同来源的151种工具,结果表明当前大多数工具侧重于领域特定数据库中的数据清洗与修复,而非数据仓库。仅有少数工具(具体为十种)展现出检测数据质量规则的能力,更不用说在数据仓库中实现此功能。这些发现凸显了市场与学术研究在数据仓库中AI增强的数据质量规则检测方面存在显著差距。本文主张在该领域进一步开展研究,以提升数据质量管理流程的效率、减少人工工作量并降低成本。本研究强调了开发先进工具以实现自动化数据质量规则检测的必要性,为改进适应数据仓库环境的数据质量管理实践铺平道路。本研究可指导组织选择最能满足其需求的数据质量工具。