Database normalization is crucial to preserving data integrity. However, it is time-consuming and error-prone, as it is typically performed manually by data engineers. To this end, we present Miffie, a database normalization framework that leverages the capability of large language models. Miffie enables automated data normalization without human effort while preserving high accuracy. The core of Miffie is a dual-model self-refinement architecture that combines the best-performing models for normalized schema generation and verification, respectively. The generation module eliminates anomalies based on the feedback of the verification module until the output schema satisfies the requirement for normalization. We also carefully design task-specific zero-shot prompts to guide the models for achieving both high accuracy and cost efficiency. Experimental results show that Miffie can normalize complex database schemas while maintaining high accuracy.
翻译:数据库规范化对于维护数据完整性至关重要。然而,这一过程通常由数据工程师手动完成,既耗时又易出错。为此,我们提出Miffie——一种利用大语言模型能力的数据库规范化框架。Miffie无需人工干预即可实现自动数据规范化,同时保持高准确性。其核心是双模型自优化架构,该架构分别采用性能最优的模型进行规范化模式生成与验证。生成模块根据验证模块的反馈消除数据异常,直至输出模式满足规范化要求。我们还精心设计了面向特定任务的零样本提示词,以引导模型在保持高准确性的同时实现成本效益。实验结果表明,Miffie能够有效规范化复杂数据库模式并保持高准确性。