Common Foundations for Recursive Shape Languages

Shqiponja Ahmetaj,Iovka Boneva,Jan Hidders,Maxime Jakubowski,Jose-Emilio Labra-Gayo,Wim Martens,Fabio Mogavero,Filip Murlak,Cem Okulmus,Ognjen Savković,Mantas Šimkus,Dominik Tomaszuk

As schema languages for RDF data become more mature, we are seeing efforts to extend them with recursive semantics, applying diverse ideas from logic programming and description logics. While ShEx has an official recursive semantics based on greatest fixpoints (GFP), the discussion for SHACL is ongoing and seems to be converging towards least fixpoints (LFP). A practical study we perform shows that, indeed, ShEx validators implement GFP, whereas SHACL validators are more heterogeneous. This situation creates tension between ShEx and SHACL, as their semantic commitments appear to diverge, potentially undermining interoperability and predictability. We aim to clarify this design space by comparing the main semantic options in a principled yet accessible way, hoping to engage both theoreticians and practitioners, especially those involved in developing tools and standards. We present a unifying formal semantics that treats LFP, GFP, and supported model semantics (SMS), clarifying their relationships and highlighting a duality between LFP and GFP on stratified fragments. Next, we investigate to which extent the directions taken by SHACL and ShEx are compatible. We show that, although ShEx and SHACL seem to be going in different directions, they include large fragments with identical expressive power. Moreover, there is a strong correspondence between these fragments through the aforementioned principle of duality. Finally, we present a complete picture of the data and combined complexity of ShEx and SHACL validation under LFP, GFP, and SMS, showing that SMS comes at a higher computational cost under standard complexity-theoretic assumptions.

翻译：随着RDF数据的模式语言日趋成熟，我们观察到学界正借助逻辑编程和描述逻辑中的多样化思想，尝试为其扩展递归语义。尽管ShEx基于最大不动点（GFP）拥有官方递归语义，但关于SHACL的讨论仍在进行中，且似乎正趋向于最小不动点（LFP）。我们开展的一项实践研究表明，ShEx验证器确实实现了GFP，而SHACL验证器则更具异质性。这种状况在ShEx与SHACL之间造成张力，因为两者的语义承诺存在分歧，可能损害互操作性与可预测性。为厘清这一设计空间，我们以原则性且易于理解的方式比较主要语义选项，期望能吸引理论研究者与实践者（尤其是工具与标准开发人员）共同参与。我们提出一种统一的形式语义，涵盖LFP、GFP及支持模型语义（SMS），阐明三者关系并揭示分层片段上LFP与GFP的对偶性。进而探究SHACL与ShEx各自发展方向的可兼容程度：尽管两者看似分道扬镳，但其包含的大规模片段具有相同的表达能力，且这些片段可通过前述对偶原则实现强对应。最后，我们完整呈现了在LFP、GFP及SMS下ShEx与SHACL验证的数据复杂度与组合复杂度，表明在标准复杂度理论假设下SMS需承担更高的计算代价。