The Bar-Hillel construction is a classic result in formal language theory. It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free. In the construction, the regular language is specified by a finite-state automaton. However, neither the original construction (Bar-Hillel et al., 1961) nor its weighted extension (Nederhof and Satta, 2003) can handle finite-state automata with $\varepsilon$-arcs. While it is possible to remove $\varepsilon$-arcs from a finite-state automaton efficiently without modifying the language, such an operation modifies the automaton's set of paths. We give a construction that generalizes the Bar-Hillel in the case where the desired automaton has $\varepsilon$-arcs, and further prove that our generalized construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the asymptotic size of the original construction.
翻译:Bar-Hillel构造是形式语言理论中的一个经典结果。它通过一种简单的构造表明,上下文无关语言与正则语言的交集本身仍是上下文无关语言。在该构造中,正则语言由有限状态自动机定义。然而,无论是原始构造(Bar-Hillel等,1961年)还是其加权扩展(Nederhof和Satta,2003年)都无法处理包含$\varepsilon$-弧的有限状态自动机。尽管可以在不改变语言的情况下高效地从有限状态自动机中移除$\varepsilon$-弧,但此类操作会改变自动机的路径集。我们提出了一种在所需自动机包含$\varepsilon$-弧的情况下推广Bar-Hillel构造的方法,并进一步证明,我们的推广构造生成的文法能够同时编码输入自动机和文法的结构,同时保留原始构造的渐近规模。