Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Autonomous agents deployed in regulated domains must produce a verification artifact per consequential output: a record an auditor can re-execute offline, capturing what was claimed, against what source, by whom, when, and how. Production verification today splits into two unstandardized halves. Probabilistic verdict patterns (self-consistency voting, reviewer LLM ensembles) produce judgments, not artifacts. Artifact-producing patterns (RAG, tool-augmented traces, generator-verifier loops) produce vendor-specific records no external auditor can reconstruct without bespoke integration. Pramana defines the missing wire format. Every consequential agent output is wrapped in a typed ClaimAttestation with one of four variants (measurement, inference, analogy, citation), each paired with a verify() operation against the recorded source. verify() is deterministic for MeasurementClaim and CitationClaim. For InferenceClaim and AnalogyClaim, determinism is conditional on the oracle (audit-replayable when LLM-backed). The four-way typology derives from classical Indian epistemology (pramana, valid means of knowledge). The lifecycle is specified in TLA+ and exhaustively verified under TLC across three symmetry-reduced models: 38,563 distinct reachable states, zero invariant violations. The Python reference implementation passes 84 tests. An A2A and MCP wire-extension manifest layers three deployment-grade invariants: reachability, SLA bound, and offline re-verifiability. An exploratory pilot (n=100, 2,275 reviewer calls) probes LLM-as-judge in code generation. The strongest observation is a 40-percentage-point raw FPR delta across corpora, consistent with reference-solution quality contributing significantly. The pilot does not validate Pramana on its own; the structural argument and formal verification do that.

翻译：在受监管领域部署的自主智能体，必须为每个关键输出生成可验证的制品：一份可供审计员离线重放的记录，准确捕获所声明的具体内容、信息来源、声明主体、时间戳及验证方式。当前的生产级验证体系分裂为两个未标准化的分支。概率性裁决模式（自一致性投票、评审者大模型集成）产生的是判决结果而非可验证制品；而制品生成模式（检索增强生成、工具增强轨迹、生成器-验证器循环）产出的是厂商特有记录，外部审计员需通过定制化集成才能重构验证过程。Pramana定义了缺失的传输协议格式：每个关键智能体输出被封装为带类型的ClaimAttestation，包含四种变体（测量、推理、类比、引用），每种变体均配有针对记录源执行的verify()操作。对于MeasurementClaim和CitationClaim，verify()具有确定性；而对于InferenceClaim与AnalogyClaim，确定性取决于预言机（当由大模型驱动时可实现审计重放）。四维分类体系源于古印度知识论（pramana，有效认知手段）。其生命周期以TLA+形式化规约，经TLC模型检测器在三个对称约减模型上全面验证：共发现38,563个可达状态，零不变违反。Python参考实现通过84项测试。A2A与MCP协议扩展清单封装了三个部署级不变量：可达性、服务等级协议边界及离线重验证能力。探索性试点实验（n=100，2,275次评审调用）检验了大模型作为裁判在代码生成中的表现，最显著的观测结果为跨语料库原始假阳性率差值达40个百分点，该差异与参考解质量高度相关。需注意，该试点并非对Pramana自身的验证——其结构性论证与形式化验证方构成核心证据。