Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop), probabilistic choice, and mutable environments. We prove type safety, termination of bounded fixpoints, and soundness of derived lint rules, with partial Coq mechanization (1,567 lines, 43 completed proofs). As a practical application, we derive a lint tool that detects structural configuration errors directly from the operational semantics. An evaluation on 835 real-world GitHub agent configurations shows that 94.1% are structurally incomplete under $λ_A$, with YAML-only lint precision at 54%, rising to 96--100% under joint YAML+Python AST analysis on 175 samples. This gap quantifies, for the first time, the degree of semantic entanglement between declarative configuration and imperative code in the agent ecosystem. We further show that five mainstream paradigms (LangGraph, CrewAI, AutoGen, OpenAI SDK, Dify) embed as typed $λ_A$ fragments, establishing $λ_A$ as a unifying calculus for LLM agent composition.
翻译:现有的LLM智能体框架缺乏形式化语义:缺少确定智能体配置是否结构良好或是否会终止的原则性方法。我们提出$λ_A$,一种用于智能体组合的类型化λ演算,它在简单类型λ演算基础上扩展了预言机调用、有界不动点(ReAct循环)、概率选择和可变环境。我们证明了类型安全性、有界不动点的终止性以及派生lint规则的正确性,并提供了部分Coq机械化实现(1567行代码,43个已完成证明)。作为实际应用,我们从操作语义中派生出一个lint工具,能够直接检测结构配置错误。对835个真实世界GitHub智能体配置的评估显示,在$λ_A$下94.1%的配置结构不完整,其中纯YAML的lint精确率为54%,而对175个样本进行YAML+Python AST联合分析时,精确率升至96-100%。这一差距首次量化了智能体生态系统中声明式配置与命令式代码之间的语义纠缠程度。我们进一步证明,五种主流范式(LangGraph、CrewAI、AutoGen、OpenAI SDK、Dify)可嵌入为类型化的$λ_A$片段,确立了$λ_A$作为LLM智能体组合的统一演算。