We study event-graph substrates: a class of world models that represent agent state as an append-only log of typed RDF triples and answer counterfactual queries by forking the log under a structured intervention vocabulary. Substrates are inspectable at the triple level, support exact counterfactuals, and transfer across domains without learned components. We formalize the class, prove a duality between explanatory and counterfactual queries that reduces both to the same causal-ancestor traversal, and evaluate a 1,400-line CLEVRER-DSL interpreter atop a domain-agnostic substrate runtime at full CLEVRER validation scale (n=75,618). The substrate exceeds the NS-DR symbolic oracle on all four per-question categories (by 9.89, 20.26, 17.65, and 0.80 percentage points), and exceeds the parametric ALOE baseline on descriptive and explanatory while lagging on predictive and counterfactual. We also introduce twin-EventLog, a 500-specification Park-canonical Smallville counterfactual benchmark on which the substrate exceeds Llama-3.1-8B with full context by 18.80 points joint accuracy.
翻译:我们研究事件图基板:一类世界模型,将智能体状态表示为仅追加类型的RDF三元组日志,并通过在结构化干预词汇下分叉日志来回答反事实查询。基板可在三元组层面进行检查,支持精确反事实,并在无需学习组件的情况下跨领域迁移。我们形式化了该类,证明了解释性查询与反事实查询之间的对偶性,将两者简化为相同的因果祖先遍历,并在领域无关的基板运行时上评估了1,400行的CLEVRER-DSL解释器,覆盖完整CLEVRER验证集规模(n=75,618)。该基板在所有四个问题类别上均超过NS-DR符号预测器(分别高出9.89、20.26、17.65和0.80个百分点),并在描述性和解释性问题上超过参数化ALOE基线,但在预测性和反事实问题上略逊。我们还引入了twin-EventLog——一个包含500个规范、基于Park标准Smallville的反事实基准,在该基准上,基板在联合准确率上超过具备完整上下文的Llama-3.1-8B模型18.80个百分点。