Multiple algorithms are known for efficiently calculating the prefix probability of a string under a probabilistic context-free grammar (PCFG). Good algorithms for the problem have a runtime cubic in the length of the input string. However, some proposed algorithms are suboptimal with respect to the size of the grammar. This paper proposes a novel speed-up of Jelinek and Lafferty's (1991) algorithm, which runs in $\mathcal{O}({N^3 |\mathcal{N}|^3 + |\mathcal{N}|^4})$, where $N$ is the input length and $|\mathcal{N}|$ is the number of non-terminals in the grammar. In contrast, our speed-up runs in $\mathcal{O}({N^2 |\mathcal{N}|^3+N^3|\mathcal{N}|^2})$.
翻译:在概率上下文无关文法(PCFG)下,高效计算字符串前缀概率的算法已有多种已知方案。针对该问题的优秀算法的时间复杂度为输入字符串长度的立方阶。然而,部分已有算法在文法规模方面并非最优。本文提出一种针对Jelinek和Lafferty(1991)算法的新型加速方法,其时间复杂度为$\mathcal{O}({N^3 |\mathcal{N}|^3 + |\mathcal{N}|^4})$,其中$N$为输入长度,$|\mathcal{N}|$为文法的非终结符数量。相比之下,我们的加速方法的时间复杂度为$\mathcal{O}({N^2 |\mathcal{N}|^3+N^3|\mathcal{N}|^2})$。