Motivated by recent connections to factorised databases, we analyse the efficiency of representations by context free grammars (CFGs). Concretely, we prove a recent conjecture by Kimelfeld, Martens, and Niewerth (ICDT 2025), that for finite languages representations by general CFGs can be doubly-exponentially smaller than those by unambiguous CFGs. To do so, we show the first exponential lower bounds for representation by unambiguous CFGs of a finite language that can efficiently be represented by CFGs. Our proof first reduces the problem to proving a lower bound in a non-standard model of communication complexity. Then, we argue similarly in spirit to a recent discrepancy argument to show the required communication complexity lower bound. Our result also implies that a finite language may admit an exponentially smaller representation as a nondeterministic finite automaton than as an unambiguous CFG.
翻译:受近期与因子化数据库关联的启发,我们分析了上下文无关文法(CFG)表示效率。具体而言,我们证明了Kimelfeld、Martens和Niewerth(ICDT 2025)提出的最新猜想:对于有限语言,通用CFG的表示规模可能比无歧义CFG的表示规模小双指数倍。为此,我们针对一个可由CFG高效表示的有限语言,首次证明了无歧义CFG表示规模的指数下界。我们的证明首先将问题规约至非标准通信复杂度模型中的下界证明,随后借鉴近期差异论证的思想,证明了所需的通信复杂度下界。该结果还表明,有限语言作为非确定性有限自动机的表示规模可能比作为无歧义CFG的表示规模小指数倍。