The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.
翻译:操作系统内核的形式化验证需要精确的规约来捕获系统调用的预期行为。手动编写这些规约要求深厚的领域专业知识,从而推动了使用大型语言模型(LLM)来自动化该过程。然而,在基于Hyperkernel操作系统的内核的OSV-Bench基准测试(包含245个规约生成任务)中,最佳报告的Pass@1仅为55.10%。我们提出了一种领域知识提示方法(BODHI),该方法通过结构化的C到Python翻译指南增强标准的少样本提示,该指南涵盖15类特定领域的翻译模式。受结构化思维链(SCoT)提示启发,该指南按关注点分离原则组织翻译,将前提条件提取和后置条件生成视为不同类别。在来自六个提供商(Anthropic、Mistral、Amazon、DeepSeek、Meta、Alibaba)的九个模型上(涵盖密集、专家混合和推理架构)进行评估后,BODHI改进了所有测试模型,提升幅度从+11%到+32%不等。最佳配置(Claude Opus 4.6 + BODHI)达到了96.73%的Pass@1。BODHI同时减少了语法错误和语义错误,对具备足够指令跟随能力以利用结构化参考材料的模型效果最为显著。这些结果表明,领域知识注入是一种与模型无关的技术,显著弥合了通用代码生成与形式化规约合成之间的差距。