Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

Hallucinations in large language models (LLMs) are outputs that are syntactically coherent but factually incorrect or contextually inconsistent. They are persistent obstacles in high-stakes industrial settings such as engineering design, enterprise resource planning, and IoT telemetry platforms. We present and compare five prompt engineering strategies intended to reduce the variance of model outputs and move toward repeatable, grounded results without modifying model weights or creating complex validation models. These methods include: (M1) Iterative Similarity Convergence, (M2) Decomposed Model-Agnostic Prompting, (M3) Single-Task Agent Specialization, (M4) Enhanced Data Registry, and (M5) Domain Glossary Injection. Each method is evaluated against an internal baseline using an LLM-as-Judge framework over 100 repeated runs per method (same fixed task prompt, stochastic decoding at tau = 0.7. Under this evaluation setup, M4 (Enhanced Data Registry) received ``Better'' verdicts in all 100 trials; M3 and M5 reached 80% and 77% respectively; M1 reached 75%; and M2 was net negative at 34% when compared to single shot prompting with a modern foundation model. We then developed enhanced version 2 (v2) implementations and assessed them on a 10-trial verification batch; M2 recovered from 34% to 80%, the largest gain among the four revised methods. We discuss how these strategies help overcome the non-deterministic nature of LLM results for industrial procedures, even when absolute correctness cannot be guaranteed. We provide pseudocode, verbatim prompts, and batch logs to support independent assessment.

翻译：大型语言模型（LLM）的幻觉现象指输出在语法上连贯但事实上错误或上下文不一致。在高风险的工业场景（如工程设计、企业资源规划及物联网遥测平台）中，这始终是难以克服的障碍。本文提出并比较了五种旨在降低模型输出方差、在不修改模型权重或创建复杂验证模型的前提下实现可重复、有依据结果的提示工程策略。这些方法包括：（M1）迭代相似性收敛、（M2）解耦式模型无关提示、（M3）单任务智能体特化、（M4）增强数据注册表与（M5）领域词汇注入。每种方法均在LLM-as-Judge框架下评估，针对固定任务提示执行100次重复实验（随机解码温度参数τ=0.7）。在该评估设置下，M4（增强数据注册表）在所有100次试验中均获得“更优”判定；M3与M5分别达到80%和77%；M1达到75%；而M2相比直接使用现代基础模型进行单次提示的基准方法表现为净负值（34%）。我们随后开发了增强版v2实现，并在10次试验的验证批次中完成评估：M2从34%提升至80%，成为四种改进方法中改善幅度最大的策略。本文讨论了这些策略如何帮助克服工业流程中LLM结果的不确定性本质，即使无法保证绝对正确性时仍有效。我们提供了伪代码、逐字提示文本及批次日志以支持独立评估。