We revisit the problem of minimal local grammar-based coding. In this setting, the local grammar encoder encodes grammars symbol by symbol, whereas the minimal grammar transform minimizes the grammar length in a preset class of grammars as given by the length of local grammar encoding. It has been known that such minimal codes are strongly universal for a strictly positive entropy rate, whereas the number of rules in the minimal grammar constitutes an upper bound for the mutual information of the source. Whereas the fully minimal code is likely intractable, the constrained minimal block code can be efficiently computed. In this article, we present a new, simpler, and more general proof of strong universality of the minimal block code, regardless of the entropy rate. The proof is based on a simple Zipfian bound for ranked probabilities. By the way, we also show empirically that the number of rules in the minimal block code cannot clearly discriminate between long-memory and memoryless sources, such as a text in English and a random permutation of its characters. This contradicts our previous expectations.
翻译:我们重新探讨了最小化局部文法编码的问题。在该设定中,局部文法编码器逐符号对文法进行编码,而最小化文法变换则通过局部文法编码的长度,在预设的文法类中最小化文法的长度。已知这类最小化编码对严格正熵率具有强普适性,同时最小化文法中的规则数量构成了信源互信息的上界。尽管完全最小化编码可能难以处理,但受约束的最小化块编码能够高效计算。本文针对最小化块编码的强普适性提出了一种全新、更简洁且更通用的证明,该证明不依赖于熵率。该证明基于排序概率的简单齐普夫界。此外,我们通过实验表明,最小化块编码中的规则数量无法清晰区分长记忆信源与无记忆信源(例如英文文本及其字符的随机排列),这与我们先前的预期相悖。