When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization. We close by discussing some of the links between these findings and prior results in cognitive science and neuroscience, and the broader implications.
翻译:机器学习系统何时无法泛化,哪些机制能改善其泛化能力?本文从认知科学中汲取灵感,论证参数化机器学习系统的一个弱点在于无法展现潜在学习——即学习与当前任务无关、但可能对未来任务有用的信息。我们展示了这一视角如何将语言建模中的逆转诅咒到基于智能体导航的新发现等一系列失败案例联系起来。随后指出认知科学将情景记忆视为解决这些问题的潜在方案。相应地,我们证明配备理想检索机制的系统能够更灵活地运用学习经验,从而在应对诸多挑战时实现更好的泛化。我们还明确了有效运用检索机制的关键组件,包括示例内上下文学习对于获取跨检索示例信息使用能力的重要性。总之,我们的研究结果揭示了当前机器学习系统相较于自然智能存在数据效率相对低下的一个可能原因,并有助于理解检索方法如何补充参数化学习以提升泛化能力。最后,我们讨论了这些发现与认知科学、神经科学先前成果的关联及其更广泛的意义。