Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than surface matching. We introduce CARENLI, a compartmentalised agentic framework that routes each premise-statement pair to a reasoning family and then applies a specialised solver with explicit verification and targeted refinement. We evaluate on an expanded CTNLI benchmark of 200 instances spanning four reasoning families: Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Across four contemporary backbone models, CARENLI improves mean accuracy from about 23% with direct prompting to about 57%, a gain of roughly 34 points, with the largest benefits on structurally demanding reasoning types. These results support compartmentalisation plus verification as a practical route to more reliable and auditable clinical inference.
翻译:大型语言模型能够为临床自然语言推理生成流畅的判断,但当决策需要正确的推理模式而非表面匹配时,它们常常会失败。我们提出了CARENLI——一个模块化智能推理框架,该框架将每个前提-陈述对路由至相应的推理族,随后应用具有显式验证和定向优化功能的专用求解器。我们在包含200个实例的扩展版CTNLI基准上进行了评估,这些实例涵盖四个推理族:因果归因、组合基础、认知验证和风险状态抽象。在四个当代骨干模型上,CARENLI将平均准确率从直接提示的约23%提升至约57%,增益约34个百分点,其中在结构要求较高的推理类型上收益最大。这些结果支持模块化加验证作为实现更可靠、可审计的临床推理的实用路径。