JSON Schema is the de-facto standard schema language for JSON data. The language went through many minor revisions, but the most recent versions of the language added two novel features, dynamic references and annotation-dependent validation, that change the evaluation model. Modern JSON Schema is the name used to indicate all versions from Draft 2019-09, which are characterized by these new features, while Classical JSON Schema is used to indicate the previous versions. These new "modern" features make the schema language quite difficult to understand, and have generated many discussions about the correct interpretation of their official specifications; for this reason we undertook the task of their formalization. During this process, we also analyzed the complexity of data validation in Modern JSON Schema, with the idea of confirming the PTIME complexity of Classical JSON Schema validation, and we were surprised to discover a completely different truth: data validation, that is expected to be an extremely efficient process, acquires, with Modern JSON Schema features, a PSPACE complexity. In this paper, we give the first formal description of Modern JSON Schema, which we consider a central contribution of the work that we present here. We then prove that its data validation problem is PSPACE-complete. We prove that the origin of the problem lies in dynamic references, and not in annotation-dependent validation. We study the schema and data complexities, showing that the problem is PSPACE-complete with respect to the schema size even with a fixed instance, but is in PTIME when the schema is fixed and only the instance size is allowed to vary. Finally, we run experiments that show that there are families of schemas where the difference in asymptotic complexity between dynamic and static references is extremely visible, even with small schemas.
翻译:JSON Schema是JSON数据的事实标准模式语言。该语言经历了多次小版本修订,但最新版本新增了动态引用和注释依赖验证这两个特性,改变了评估模型。现代JSON Schema指代自Draft 2019-09起的所有版本,其特点在于这些新特性,而经典JSON Schema则指代之前的版本。这些新的"现代"特性使得模式语言相当难以理解,并引发了大量关于其官方规范正确解读的讨论;为此我们承担了其形式化的工作。在此过程中,我们还分析了现代JSON Schema中数据验证的复杂性,期望验证经典JSON Schema验证的PTIME复杂度,却意外发现了一个完全不同的真相:数据验证(本应是极其高效的过程)在现代JSON Schema特性下获得了PSPACE复杂度。本文首次给出现代JSON Schema的形式化描述,我们认为这是本文工作的核心贡献。随后我们证明其数据验证问题是PSPACE完全的。我们证明问题的根源在于动态引用,而非注释依赖验证。我们研究了模式复杂度和数据复杂度,表明即使实例固定,问题相对于模式规模是PSPACE完全的;但当模式固定且仅允许实例规模变化时,问题属于PTIME。最后,我们通过实验表明,存在模式族使得动态与静态引用之间的渐近复杂度差异即使在小规模模式下也极为显著。