帕累托最优非均匀语言生成 (Pareto-optimal Non-uniform Language Generation)

Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate new strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong non-uniform generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but not the enumeration order. However, for both these works, the language-wise generation times $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study Pareto-optimality of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, must satisfy that its generation time for some other language $L'$ is strictly worse than $t^\star(L')$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of noisy as well as representative generation.

翻译：Kleinberg 与 Mullainathan（2024）近期提出了一种有趣的极限语言生成模型：给定一个可数语言集合，以及一个从该集合中枚举某个语言 $L$ 中字符串的对手，目标是从目标语言中生成新的字符串，使得在某个有限时间后生成的所有字符串均为有效。Li、Raman 与 Tewari（2024）以及 Charikar 与 Pabbaraju（2024）在该模型中展示了强大的非均匀生成保证，给出了在观察到仅依赖于 $L$（及语言集合）而与枚举顺序无关的、数量为 $t(L)$ 的不同输入字符串后，从 $L$ 生成新有效字符串的算法。然而，对于这两项工作，算法的语言相关生成时间 $t(L)$ 可能严格次优。本文研究极限情况下非均匀语言生成的帕累托最优性。我们提出一种算法，其生成时间 $t^\star(L)$ 具有（近似）帕累托最优性：任何其他算法若对某个语言 $L$ 的生成时间严格小于 $t^\star(L)$，则必然存在另一语言 $L'$ 使其生成时间严格劣于 $t^\star(L')$。帕累托最优性本质上是非均匀生成所能达到的最佳结果。我们的算法框架可灵活扩展，进一步为实际应用中受关注的含噪声生成及代表性生成场景提供帕累托最优的非均匀生成算法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日