A fundamental problem of applying Large Language Models (LLMs) to important applications is that LLMs do not always follow instructions, and violations are often hard to observe or check. In LLM-based agentic workflows, such violations can propagate and amplify along reasoning chains, causing task failures and system incidents. This paper presents NSVIF, a neuro-symbolic framework for verifying whether an LLM's output follows the instructions used to prompt the LLM. NSVIF is a universal, general-purpose verifier; it makes no assumption about the instruction or the LLM. NSVIF formulates instruction-following verification as a constraint-satisfaction problem by modeling user instructions as constraints. NSVIF models both logical and semantic constraints; constraint solving is done by a unified solver that orchestrates logical reasoning and semantic analysis. To evaluate NSVIF, we develop VIFBENCH, a new benchmark for instruction-following verifiers with fine-grained data labels. Experiments show that NSVIF significantly outperforms LLM-based approaches and provides interpretable feedback. We also show that feedback from NSVIF helps improve LLMs' instruction-following capability without post-training.
翻译:将大型语言模型应用于重要场景时面临一个根本性问题:LLM并非总能遵循指令,且违规行为往往难以观测或检测。在基于LLM的智能体工作流中,此类违规行为会沿推理链传播放大,导致任务失败与系统事故。本文提出NSVIF——一种神经符号框架,用于验证LLM输出是否遵循其提示指令。NSVIF是通用型验证器,对指令内容与LLM类型不作任何假设。该框架通过将用户指令建模为约束条件,将指令遵循验证转化为约束满足问题。NSVIF同时建模逻辑约束与语义约束,并通过协调逻辑推理与语义分析的统一求解器完成约束求解。为评估NSVIF,我们构建了VIFBENCH——首个带有细粒度数据标注的指令遵循验证基准测试。实验表明NSVIF显著优于基于LLM的验证方法,并能提供可解释的反馈。研究还证明NSVIF生成的反馈可在无需后训练的情况下提升LLM的指令遵循能力。