High data quality is critical for reliable analytics and operational efficiency. A growing ecosystem of tools has emerged to support data quality management, ranging from lightweight open-source libraries to comprehensive enterprise platforms. This paper evaluates six data quality tools: Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama. The evaluation criteria cover rule definition, duplicate detection, metric aggregation, and uncertainty handling, and were derived from real-world use cases of company partners. We further examine to what extent these tools integrate Large Language Models (LLMs). Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.
翻译:高质量数据对于可靠分析与运营效率至关重要。为支持数据质量管理,从轻量级开源库到综合企业平台的相关工具生态系统日益壮大。本文评估了六款数据质量工具:Great Expectations、Deequ、Evidently、Informatica、Experian 及 Ataccama。评估标准涵盖规则定义、重复检测、指标聚合及不确定性处理,并基于企业合作伙伴的实际应用案例提炼而成。我们进一步考察了这些工具在多大程度上集成大语言模型(LLM)。研究结果表明:专有工具提供更全面的测量功能与新兴的基于大语言模型的辅助能力,而开源工具则具备灵活性,但需付出更高的实施成本。在所有工具中,大语言模型的集成仍局限于规则创建工作流程;目前尚无任何被评估工具支持通过大语言模型直接进行数据验证。