Evaluating retrieval-augmented generation (RAG) pipelines requires corpora where ground truth is knowable, temporally structured, and cross-artifact properties that real-world datasets rarely provide cleanly. Existing resources such as the Enron corpus carry legal ambiguity, demographic skew, and no structured ground truth. Purely LLM-generated synthetic data solves the legal problem but introduces a subtler one: the generating model cannot be prevented from hallucinating facts that contradict themselves across documents.We present OrgForge, an open-source multi-agent simulation framework that enforces a strict physics-cognition boundary: a deterministic Python engine maintains a SimEvent ground truth bus; large language models generate only surface prose, constrained by validated proposals. An actor-local clock enforces causal timestamp correctness across all artifact types, eliminating the class of timeline inconsistencies that arise when timestamps are sampled independently per document. We formalize three graph-dynamic subsystems stress propagation via betweenness centrality, temporal edge-weight decay, and Dijkstra escalation routing that govern organizational behavior independently of any LLM. Running a configurable N-day simulation, OrgForge produces interleaved Slack threads, JIRA tickets, Confluence pages, Git pull requests, and emails, all traceable to a shared, immutable event log. We additionally describe a causal chain tracking subsystem that accumulates cross-artifact evidence graphs per incident, a hybrid reciprocal-rank-fusion recurrence detector for identifying repeated failure classes, and an inbound/outbound email engine that routes vendor alerts, customer complaints, and HR correspondence through gated causal chains with probabilistic drop simulation. OrgForge is available under the MIT license.
翻译:评估检索增强生成(RAG)流水线需要具备可确知性、时间结构化以及跨文档属性的语料库,而现实世界的数据集很少能清晰地提供这些特性。现有资源(如安然语料库)存在法律模糊性、人口统计偏差且缺乏结构化真实标注。完全由大语言模型生成的合成数据解决了法律问题,却引入了一个更微妙的问题:无法阻止生成模型在不同文档间产生相互矛盾的虚构事实。我们提出了OrgForge,一个开源的多智能体仿真框架,它强制执行严格的物理-认知边界:一个确定性的Python引擎维护着SimEvent真实标注总线;大语言模型仅生成表层文本,并受已验证提案的约束。参与者本地时钟确保所有文档类型的因果时间戳正确性,消除了因各文档独立采样时间戳而产生的时间线不一致问题。我们形式化了三个图动态子系统——基于中介中心性的压力传播、时间边权衰减和Dijkstra升级路由——这些系统独立于任何大语言模型来管理组织行为。通过运行可配置的N天仿真,OrgForge生成交错的Slack线程、JIRA工单、Confluence页面、Git拉取请求和电子邮件,所有内容均可追溯至共享的不可变事件日志。我们还描述了一个因果链追踪子系统,该系统会为每个事件积累跨文档证据图;一种用于识别重复故障类别的混合互逆排序融合递归检测器;以及一个入站/出站电子邮件引擎,该引擎通过带概率丢弃模拟的门控因果链来路由供应商警报、客户投诉和人力资源往来邮件。OrgForge基于MIT许可证发布。