Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate new strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong non-uniform generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but not the enumeration order. However, for both these works, the language-wise generation times $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study Pareto-optimality of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, must satisfy that its generation time for some other language $L'$ is strictly worse than $t^\star(L')$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of noisy as well as representative generation.
翻译:Kleinberg 与 Mullainathan(2024)近期提出了一种有趣的极限语言生成模型:给定一个可数语言集合,以及一个从该集合中枚举某个语言 $L$ 中字符串的对手,目标是从目标语言中生成新的字符串,使得在某个有限时间后生成的所有字符串均为有效。Li、Raman 与 Tewari(2024)以及 Charikar 与 Pabbaraju(2024)在该模型中展示了强大的非均匀生成保证,给出了在观察到仅依赖于 $L$(及语言集合)而与枚举顺序无关的、数量为 $t(L)$ 的不同输入字符串后,从 $L$ 生成新有效字符串的算法。然而,对于这两项工作,算法的语言相关生成时间 $t(L)$ 可能严格次优。本文研究极限情况下非均匀语言生成的帕累托最优性。我们提出一种算法,其生成时间 $t^\star(L)$ 具有(近似)帕累托最优性:任何其他算法若对某个语言 $L$ 的生成时间严格小于 $t^\star(L)$,则必然存在另一语言 $L'$ 使其生成时间严格劣于 $t^\star(L')$。帕累托最优性本质上是非均匀生成所能达到的最佳结果。我们的算法框架可灵活扩展,进一步为实际应用中受关注的含噪声生成及代表性生成场景提供帕累托最优的非均匀生成算法。