Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop), probabilistic choice, and mutable environments. We prove type safety, termination of bounded fixpoints, and soundness of derived lint rules, with full Coq mechanization (1,519 lines, 42 theorems, 0 Admitted). As a practical application, we derive a lint tool that detects structural configuration errors directly from the operational semantics. An evaluation on 835 real-world GitHub agent configurations shows that 94.1% are structurally incomplete under $λ_A$, with YAML-only lint precision at 54%, rising to 96--100% under joint YAML+Python AST analysis on 175 samples. This gap quantifies, for the first time, the degree of semantic entanglement between declarative configuration and imperative code in the agent ecosystem. We further show that five mainstream paradigms (LangGraph, CrewAI, AutoGen, OpenAI SDK, Dify) embed as typed $λ_A$ fragments, establishing $λ_A$ as a unifying calculus for LLM agent composition.
翻译:现有LLM智能体框架缺乏形式语义:缺乏判断智能体配置是否良构或能否终止的原则性方法。我们提出$λ_A$,一种用于智能体组合的类型化λ演算,它在简单类型λ演算基础上扩展了预言机调用、有界不动点(ReAct循环)、概率选择与可变环境。我们证明了类型安全性、有界不动点的终止性以及衍生lint规则的可靠性,并完成完整的Coq机械化验证(1,519行代码、42条定理、0条假设)。作为实际应用,我们基于操作语义推导出检测结构配置错误的lint工具。对835个真实世界GitHub智能体配置的评估显示,94.1%在$λ_A$下存在结构不完整性,纯YAML版lint精度为54%,而在对175个样本进行YAML+Python AST联合分析后,精度提升至96-100%。这一差距首次量化了智能体生态系统中声明式配置与命令式代码之间的语义纠缠程度。我们进一步证明五种主流范式(LangGraph、CrewAI、AutoGen、OpenAI SDK、Dify)均可嵌入为类型化$λ_A$片段,确立了$λ_A$作为LLM智能体组合统一演算的地位。