Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs

We present Grokers, an architecture for building persistent, structured comprehension of typed knowledge graphs through bottom-up inductive traversal of dependency subgraphs. Unlike retrieval-augmented generation (RAG), which pays full comprehension cost at every query, Grokers pushes intelligence to write time: autonomous Groker agents analyze nodes in a typed stream graph, extract structured attributes via governed language model (LM) calls, and inductively compose that understanding upward through dependency relations, writing enriched typed attributes that serve all future queries at zero additional LM cost. We prove three formal properties: (1) the Byte-Identity Theorem, establishing that context blocks assembled from a transactionally-maintained denormalization index are byte-identical across LM turns between semantic changes, enabling KV-cache hit rates approaching 100%; (2) the Accumulation Monotonicity Theorem, establishing that the fraction of interactions resolved without LM calls is non-decreasing in the number of completed interactions under a governed wisdom library growth protocol; and (3) the Dual-Traversal Ordering Theorem, establishing that top-down generation and bottom-up comprehension are the unique correct traversal orderings for their respective tasks over a dependency DAG, and that their composition closes into a complete generation-comprehension cycle. We further present a deterministic alternative to embedding-based semantic search, with a synonym caching protocol whose LM fallback rate converges to zero for finite-vocabulary domains. A reference implementation is provided in the open-source Qbix / Safebox / Safebots stack.

翻译：我们提出Grokers架构，该架构通过自底向上遍历依赖子图，实现对类型化知识图谱的持久化结构化理解。与每次查询均需承担完整理解成本的检索增强生成（RAG）不同，Grokers将智能推向写时阶段：自主Groker代理分析类型化流图中的节点，通过受控语言模型（LM）调用提取结构化属性，并沿依赖关系自底向上归纳组合这些理解，写入增强后的类型化属性，从而以零额外LM成本服务于所有未来查询。我们证明了三个形式化性质：（1）字节同一性定理，建立由事务维护的反规范化索引组装而成的上下文块，在语义变化前后的LM调用之间字节相同，使得KV缓存命中率接近100%；（2）累积单调性定理，在受控知识库增长协议下，无需LM调用即可解析的交互比例随已完成交互数量的增加而非递减；（3）双遍历顺序定理，证明自顶向下生成与自底向上理解分别是依赖有向无环图上各自任务的唯一正确遍历顺序，且两者组合可闭合为完整的生成-理解循环。我们还提出了一种替代嵌入语义搜索的确定性方案，包含同义词缓存协议，在有限词汇域中其LM回退率收敛至零。基于开源Qbix/Safebox/Safebots技术栈提供了参考实现。