Large Language Models (LLMs) for code generation boost productivity but frequently introduce Knowledge Conflicting Hallucinations (KCHs), subtle, semantic errors, such as non-existent API parameters, that evade linters and cause runtime failures. Existing mitigations like constrained decoding or non-deterministic LLM-in-the-loop repair are often unreliable for these errors. This paper investigates whether a deterministic, static-analysis framework can reliably detect \textit{and} auto-correct KCHs. We propose a post-processing framework that parses generated code into an Abstract Syntax Tree (AST) and validates it against a dynamically-generated Knowledge Base (KB) built via library introspection. This non-executing approach uses deterministic rules to find and fix both API and identifier-level conflicts. On a manually-curated dataset of 200 Python snippets, our framework detected KCHs with 100\% precision and 87.6\% recall (0.934 F1-score), and successfully auto-corrected 77.0\% of all identified hallucinations. Our findings demonstrate that this deterministic post-processing approach is a viable and reliable alternative to probabilistic repair, offering a clear path toward trustworthy code generation.
翻译:用于代码生成的大语言模型(LLM)虽能提升生产力,却常引入知识冲突幻觉——即那些逃过代码检查工具并导致运行时失败的微妙语义错误,例如使用不存在的API参数。现有缓解方法(如约束解码或非确定性的LLM在环修复)对此类错误往往不可靠。本文探讨了一种确定性静态分析框架能否可靠地检测并自动修正知识冲突幻觉。我们提出一种后处理框架,将生成的代码解析为抽象语法树,并依据通过库内省动态构建的知识库进行验证。这种非执行方法采用确定性规则来发现并修复API级及标识符级的冲突。在包含200个Python代码片段的人工标注数据集上,本框架以100%的精确率和87.6%的召回率(F1分数0.934)检测出知识冲突幻觉,并成功自动修正了全部已识别幻觉中的77.0%。我们的研究结果表明,这种确定性后处理方法可作为概率性修复方案可行且可靠的替代路径,为可信代码生成提供了明确的技术途径。