In text generation, a large language model (LM) makes a choice of each new word based only on the former selection of its context using the softmax function. Nevertheless, the link statistics information of concurrent words based on a scene-specific corpus is valuable in choosing the next word, which can help to ensure the topic of the generated text to be aligned with the current task. To fully explore the co-occurrence information,we propose a graphmax function for task-specific text generation. Using the graph-based regularization, graphmax enables the final word choice to be determined by both the global knowledge from the LM and the local knowledge from the scene-specific corpus. The traditional softmax function is regularized with a graph total variation (GTV) term, which incorporates the local knowledge into the LM and encourages the model to consider the statistical relationships between words in a scene-specific corpus. The proposed graphmax is versatile and can be readily plugged into any large pre-trained LM for text generation and machine translation. Through extensive experiments, we demonstrate that the new GTV-based regularization can improve performances in various natural language processing tasks in comparison with existing methods. Moreover, through human experiments, we observe that participants can easily distinguish the text generated by graphmax or softmax.
翻译:在文本生成过程中,大语言模型基于softmax函数,仅根据先前选择的上下文信息决定每个新词的选择。然而,基于场景特定语料库的共现词关联统计信息,在选择后续词汇时具有重要价值,有助于确保生成文本的主题与当前任务保持一致。为充分挖掘共现信息,我们提出了一种面向特定任务文本生成的graphmax函数。通过基于图的正则化,graphmax使得最终词汇选择由大语言模型的全局知识与场景特定语料库的局部知识共同决定。传统softmax函数经过图总变分项的修正后,能够将局部知识融入大语言模型,并促使模型考虑场景特定语料库中词汇间的统计关联关系。所提出的graphmax具有通用性,可便捷地嵌入任意大型预训练语言模型,适用于文本生成与机器翻译任务。通过大量实验证明,与现有方法相比,基于图总变分的新型正则化方法能有效提升多种自然语言处理任务的性能。此外,人类受试者实验表明,参与者能够清晰区分由graphmax与softmax生成的文本差异。