By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. Given the textual context, this planning module learns to predict future abstract writing actions, which correspond to centroids in a clustered text embedding space. By conditioning on these actions, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.
翻译:通过训练在无标注语料库中预测下一个词元,大型语言模型学会了无需任何标注数据即可执行多种任务。然而,其下一个词元预测目标在需要规划的场景(例如撰写连贯文章)中的性能可能受限。本文通过自监督学习目标训练一个模块,用于规划未来的写作过程。该规划模块在给定文本上下文的情况下,学习预测未来的抽象写作动作,这些动作对应于聚类文本嵌入空间中的质心。通过以这些动作为条件,我们的模型将成功的语言建模范式以无监督方式扩展至更抽象的规划层面。实验表明,我们的方法普遍提升了语言建模性能,尤其在文本结构方面效果显著。由于该框架采用独立于语言模型的无监督规划器模块,新的规划器模块可进行大规模训练并易于在社区中共享。