The categorical contours of the Chomsky-Schützenberger representation theorem

from arxiv, This is a thoroughly revised and expanded version of a paper with a similar title presented at the 38th Conference on the Mathematical Foundations of Programming Semantics (MFPS 2022). On the request of the LMCS referees, an Addendum included in a previous version of this paper but not in the original conference version has been moved into a separate paper (in preparation). arXiv admin note: text overlap with arXiv:2212.09060

We develop fibrational perspectives on context-free grammars and on nondeterministic finite-state automata over categories and operads. A generalized CFG is a functor from a free colored operad (aka multicategory) generated by a pointed finite species into an arbitrary base operad: this encompasses classical CFGs by taking the base to be a certain operad constructed from a free monoid, as an instance of a more general construction of an \emph{operad of spliced arrows} $\mathcal{W}\,\mathcal{C}$ for any category $\mathcal{C}$. A generalized NFA is a functor from an arbitrary bipointed category or pointed operad satisfying the unique lifting of factorizations and finite fiber properties: this encompasses classical word automata and tree automata without $\epsilon$-transitions, but also automata over non-free categories and operads. We show that generalized context-free and regular languages satisfy suitable generalizations of many of the usual closure properties, and in particular we give a simple conceptual proof that context-free languages are closed under intersection with regular languages. Finally, we observe that the splicing functor $\mathcal{W} : Cat \to Oper$ admits a left adjoint $\mathcal{C}: Oper \to Cat$, which we call the \emph{contour category} construction since the arrows of $\mathcal{C}\,\mathcal{O}$ have a geometric interpretation as oriented contours of operations of $\mathcal{O}$. A direct consequence of the contour / splicing adjunction is that every pointed finite species induces a universal CFG generating a language of \emph{tree contour words.} This leads us to a generalization of the Chomsky-Sch\"utzenberger Representation Theorem, establishing that a subset of a homset $L \subseteq \mathcal{C}(A,B)$ is a CFL of arrows if and only if it is a functorial image of the intersection of a $\mathcal{C}$-chromatic tree contour language with a regular language.

翻译：我们在范畴与操作胚的背景下，为上下文无关文法和非确定性有限状态自动机发展了纤维化视角。广义上下文无关文法是从由点化有限物种生成的自由染色操作胚（亦称多重范畴）到任意基操作胚的函子：通过取基为从自由幺半群构造的特定操作胚（作为对任意范畴$\mathcal{C}$构造其\emph{拼接箭头操作胚}$\mathcal{W}\,\mathcal{C}$这一更一般构造的实例），这涵盖了经典的上下文无关文法。广义非确定性有限状态自动机是满足因子分解唯一提升性与有限纤维性质的任意双点化范畴或点化操作胚上的函子：这涵盖了经典的词自动机与无$\epsilon$转移的树自动机，同时也包括非自由范畴与操作胚上的自动机。我们证明广义上下文无关语言与正则语言满足许多常见闭包性质的适当推广，特别地我们给出了上下文无关语言在正则语言交运算下封闭性的一个简洁概念性证明。最后，我们观察到拼接函子$\mathcal{W} : Cat \to Oper$存在左伴随$\mathcal{C}: Oper \to Cat$，我们称之为\emph{轮廓范畴}构造，因为$\mathcal{C}\,\mathcal{O}$的箭头在几何上可解释为$\mathcal{O}$运算的定向轮廓。轮廓/拼接伴随关系的直接推论是：每个点化有限物种诱导一个生成\emph{树轮廓词}语言的通用上下文无关文法。这引导我们推广乔姆斯基-舒岑伯格表示定理，证明同态集子集$L \subseteq \mathcal{C}(A,B)$是箭头上下文无关语言当且仅当它是$\mathcal{C}$染色树轮廓语言与正则语言交集的函子像。