Detecting missing foreign keys (FKs) requires accurately modeling semantic dependencies across database schemas, which conventional heuristic-based methods are fundamentally limited in capturing. We propose LLM-FK, the first fully automated multi-agent framework for FK detection, designed to address three core challenges that hinder naive LLM-based solutions in large-scale complex databases: combinatorial search space explosion, ambiguous inference under limited context, and global inconsistency arising from isolated local predictions. LLM-FK coordinates four specialized agents: a Profiler that decomposes the FK detection problem into the task of validating FK candidate column pairs and prunes the search space via a unique-key-driven schema decomposition strategy; an Interpreter that injects self-augmented domain knowledge; a Refiner that constructs compact structural representations and performs multi-perspective chain-of-thought reasoning; and a Verifier that enforces schema-wide consistency through a holistic conflict resolution strategy. Experiments on five benchmark datasets demonstrate that LLM-FK consistently achieves F1-scores above 93%, surpassing existing baselines by 15% on the large-scale MusicBrainz database, while reducing the candidate search space by two to three orders of magnitude without losing true FKs and maintaining robustness under challenging conditions like missing data. These results demonstrate the effectiveness and scalability of LLM-FK in real-world databases.
翻译:检测缺失的外键(FK)需要精确建模数据库模式间的语义依赖关系,而传统的基于启发式的方法在捕捉此类依赖关系上存在根本性局限。我们提出了LLM-FK,这是首个用于外键检测的全自动多智能体框架,旨在解决阻碍基于大语言模型的朴素解决方案在大规模复杂数据库中应用的三个核心挑战:组合搜索空间爆炸、有限上下文下的模糊推理,以及由孤立局部预测导致的全局不一致性。LLM-FK协调了四个专用智能体:剖析器,其将外键检测问题分解为验证外键候选列对的任务,并通过一种基于唯一键驱动的模式分解策略来剪枝搜索空间;解释器,其注入自增强的领域知识;优化器,其构建紧凑的结构化表示并进行多视角的思维链推理;以及验证器,其通过一种整体性的冲突解决策略来强制实现模式范围内的一致性。在五个基准数据集上的实验表明,LLM-FK始终能实现93%以上的F1分数,在大规模MusicBrainz数据库上超越现有基线方法15%,同时在不丢失真实外键的情况下将候选搜索空间缩小两到三个数量级,并在数据缺失等挑战性条件下保持鲁棒性。这些结果证明了LLM-FK在现实世界数据库中的有效性和可扩展性。