When LLMs process structured data, the serialization format directly affects cost and context utilization. Standard JSON wastes tokens repeating key names in every row of a tabular array--overhead that scales linearly with row count. This paper presents JTON (JSON Tabular Object Notation), a strict JSON superset whose main idea, Zen Grid, factors column headers into a single row and encodes values with semicolons, preserving JSON's type system while cutting redundancy. Across seven real-world domains, Zen Grid reduces token counts by 15-60% versus JSON compact (28.5% average; 32% with bare_strings). Comprehension tests on 10 LLMs show a net +0.3 pp accuracy gain over JSON: four models improve, three hold steady, and three dip slightly. Generation tests on 12 LLMs yield 100% syntactic validity in both few-shot and zero-shot settings. A Rust/PyO3 reference implementation adds SIMD-accelerated parsing at 1.4x the speed of Python's json module. Code, a 683-vector test suite, and all experimental data are publicly available.
翻译:当大语言模型处理结构化数据时,序列化格式直接影响成本与上下文利用率。标准JSON在处理表格数组时,每行重复键名浪费令牌——这种开销随行数线性增长。本文提出JTON(JSON表格对象表示法),一种严格的JSON超集,其核心思想“禅式网格”将列标题提取至单行,并通过分号编码值,在保持JSON类型系统的同时消除冗余。在七个真实世界领域中,相较于JSON紧凑格式(含裸字符串时平均减少32%),禅式网格减少15-60%令牌消耗(平均28.5%)。对10个LLM的理解测试显示,相较于JSON取得净增+0.3个百分点的准确率提升:四个模型性能提升、三个保持稳定、三个略微下降。对12个LLM的生成测试表明,少样本与零样本场景均实现100%语法有效性。基于Rust/PyO3的参考实现支持SIMD加速解析,速度达Python json模块的1.4倍。代码、包含683个向量的测试套件及全部实验数据均已公开。