The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.
翻译:过去几十年间,机器学习算法设计取得了显著进展——从早期针对特定任务的浅层模型,到更通用的深度大语言模型(LLMs)。尽管现有模型在需要即时预测或上下文学习的任务中展现出良好性能,但它们仍缺乏持续学习的能力,无法将时间维度上的上下文知识有效迁移至长期参数。受人类学习过程启发,我们提出一种"睡眠"范式,使模型能够持续学习,通过重放将短期脆弱知识蒸馏为稳定长期记忆,并借助"梦境"过程实现递归式自我提升。具体而言,睡眠包含两个阶段:(1)记忆巩固:一种称为"知识播种"的自上而下蒸馏过程——将较小自体的记忆蒸馏至更大网络,在保留知识的同时提升容量。作为概念验证,我们提出面向知识播种的新型广义蒸馏方法(即基于强化学习的模仿学习与同策略蒸馏的融合);(2)梦境:自我改进阶段,模型利用强化学习自发生成包含课程学习结构的合成数据,无需人类监督即可演练新知识并优化既有能力。我们在长程任务、持续学习、知识融合及少样本泛化任务上的实验充分验证了睡眠阶段的重要性。