Language models exhibit fundamental limitations -- hallucination, brittleness, and lack of formal grounding -- that are particularly problematic in high-stakes specialist fields requiring verifiable reasoning. I investigate whether formal domain ontologies can enhance language model reliability through retrieval-augmented generation. Using mathematics as proof of concept, I implement a neuro-symbolic pipeline leveraging the OpenMath ontology with hybrid retrieval and cross-encoder reranking to inject relevant definitions into model prompts. Evaluation on the MATH benchmark with three open-source models reveals that ontology-guided context improves performance when retrieval quality is high, but irrelevant context actively degrades it -- highlighting both the promise and challenges of neuro-symbolic approaches.
翻译:语言模型存在幻觉性、脆弱性和缺乏形式化语义基础等根本性局限,这些局限在需要可验证推理的高风险专业领域中尤为突出。本研究探讨形式化领域本体能否通过检索增强生成技术提升语言模型的可靠性。以数学领域为概念验证,我们构建了一个神经符号处理流程,该流程利用OpenMath本体,结合混合检索与交叉编码器重排序技术,将相关定义注入模型提示。通过在MATH基准测试中对三个开源模型进行评估发现:当检索质量较高时,本体引导的上下文能有效提升模型性能,而不相关的上下文则会显著降低性能——这既揭示了神经符号方法的潜力,也凸显了其面临的挑战。