Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, which can then be used to generate new text. However, we argue that MLE is not always necessary and optimal, especially for closed-ended text generation tasks like machine translation. In these tasks, the goal of model is to generate the most appropriate response, which does not necessarily require it to estimate the entire data distribution with MLE. To this end, we propose a novel class of training objectives based on convex functions, which enables text generation models to focus on highly probable outputs without having to estimate the entire data distribution. We investigate the theoretical properties of the optimal predicted distribution when applying convex functions to the loss, demonstrating that convex functions can sharpen the optimal distribution, thereby enabling the model to better capture outputs with high probabilities. Experiments on various text generation tasks and models show the effectiveness of our approach. It enables autoregressive models to bridge the gap between greedy and beam search, and facilitates the learning of non-autoregressive models with a maximum improvement of 9+ BLEU points. Moreover, our approach also exhibits significant impact on large language models (LLMs), substantially enhancing their generative capability on various tasks. Source code is available at \url{https://github.com/ictnlp/Convex-Learning}.
翻译:最大似然估计(MLE)是一种统计方法,用于估计最能解释观测数据的概率分布参数。在文本生成领域,MLE常被用于训练生成式语言模型,进而生成新文本。然而,我们指出,对于机器翻译等封闭式文本生成任务,MLE并非始终必要且最优。在此类任务中,模型的目标是生成最合适的响应,而无需通过MLE估计完整数据分布。为此,我们提出一类基于凸函数的新型训练目标,使文本生成模型能够专注于高概率输出,而无需估计完整数据分布。我们研究了将凸函数应用于损失函数时最优预测分布的理论性质,证明凸函数能锐化最优分布,从而帮助模型更好地捕捉高概率输出。在多种文本生成任务和模型上的实验验证了本方法的有效性。该方法使自回归模型能弥合贪心搜索与束搜索之间的差距,并促进非自回归模型的学习,最大提升超过9个BLEU点。此外,本方法对大型语言模型(LLM)也具有显著影响,能大幅提升其在多种任务上的生成能力。源代码已公开于 \url{https://github.com/ictnlp/Convex-Learning}。