JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure -- sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface -- called On-Demand -- that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily.On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On-Demand: e.g., Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.
翻译:JSON是互联网上流行的数据交换标准。JSON文档的解析可能成为性能瓶颈。一种常见的解析策略是将输入文本转换为基于树的数据结构——有时称为文档对象模型(DOM)。我们设计并实现了一种新颖的JSON解析接口——称为“按需解析”——它在程序员看来类似于传统的基于DOM的方法。然而,其底层实现是通过指针遍历内容,仅在需要时延迟实例化结果(对象、数组、字符串、数字)。在最新的商用处理器上,我们方法的实现在多项基准测试中展现出优越性能。为确保可复现性,我们的工作已作为开源软件免费提供。多个系统已采用按需解析:例如Apache Doris、Node.js JavaScript运行时、Milvus和Velox。