Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.
翻译:摘要:现有所有文档格式均设计为供人类读者线性浏览文本。自主语言模型智能体并非“阅读”而是“检索”。这种根本性不匹配迫使智能体将整个文档注入其上下文窗口,在无关内容上浪费令牌,在多轮交互中累积状态,并跨智能体角色无差别地广播信息。我们论证这并非提示工程问题、检索问题或压缩问题,而是格式问题。我们提出ObjectGraph(.og),一种将文档重新构想为有类型、有向知识图谱以供遍历(而非作为字符串注入)的文件格式。ObjectGraph是Markdown的严格超集——每个.md文件均为有效的.og文件——仅需一个包含两个原语的查询协议即可运行,且无需工具即可被人类和智能体读取。我们形式化定义了文档消费问题,刻画了现有格式无法同时满足的六种结构特性,并证明ObjectGraph满足全部六种。我们进一步提出渐进式披露模型、角色限定访问协议及可执行断言节点作为原生格式原语。在五类文档与八类智能体任务上的实证评估表明,令牌缩减率最高达95.3%,且任务准确率无统计显著下降(p > 0.05)。在保留文档基准测试中,转译器保真度达98.7%的内容保留率。