Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve.
翻译:采样是从概率模型生成文本的常见策略,但标准祖先采样往往导致文本不连贯或不合语法。为解决此问题,人们引入了对模型采样分布的各种修改(如核采样或top-k采样),这些方法现已广泛应用于语言生成系统。我们提出了一个统一框架来理解这些技术,并将其命名为采样适配器。采样适配器通常能生成质量更高的文本,这引发了一个问题:从形式化角度来看,它们如何改变语言生成模型的(子)词级分布?为何这些局部修改能带来更高质量的文本?我们认为,这些适配器施加的分布偏移可被视为精确率与召回率之间的权衡:模型虽然会失去生成某些字符串的能力,但其在理想文本上的精确率会提升。尽管这种权衡并未体现在标准分布质量指标(如困惑度)中,但我们发现若干侧重精确率的度量指标确实表明,采样适配器能使概率分布更接近真实分布。此外,这些度量指标与更高级别的序列质量分数(特别是Mauve)存在相关性。