This paper offers a fresh look at the pumping lemma constant as an upper bound on the information required for learning Context Free Grammars. An objective function based on indirect negative evidence considers the occurrences, and non-occurrences, of a finite number of strings, encountered after a sufficiently long presentation. This function has optimal substructure in the hypotheses space, giving rise to a greedy search learner in a branch and bound method. A hierarchy of learnable classes is defined in terms of the number of production rules that must be added to interim solutions in order to incrementally fit the input. Efficiency strongly depends on the position of the target grammar in the hierarchy and on the richness of the input.
翻译:本文从全新视角审视泵引理常数,将其视为学习上下文无关文法所需信息量的上界。基于间接负证据的目标函数考虑有限字符串集在充分长呈现后的出现与未出现情况。该函数在假设空间中具有最优子结构特性,从而催生出分支定界法中的贪心搜索学习器。我们根据增量拟合输入时需要向中间解添加的产生式规则数量,定义了可学习类的层次结构。学习效率高度依赖于目标文法在层次结构中的位置以及输入数据的丰富程度。