Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines.
翻译:大型语言模型正越来越多地解决通常被认为需要人类级推理能力的任务。然而,这些模型在抽象与推理语料库(ARC)等通用智能基准测试上表现依然非常糟糕。本文从编程示例学习的角度出发处理ARC问题,并提出一种新颖且可扩展的语言模型自改进方法——代码迭代(CodeIt)。该方法在以下两个步骤间交替迭代:1)程序采样与事后重新标注,2)基于优先化经验回放的学习。通过将情节目标(即给定输入的目标程序输出)重新标注为采样程序所产生的实际输出,该方法有效应对了程序合成中奖励的极端稀疏性。将CodeIt应用于ARC数据集时,我们证明优先化事后回放与预训练及数据增强相结合可实现成功的跨任务泛化。CodeIt是首个可扩展至完整ARC评估数据集的神经符号方法。我们的方法解决了15%的ARC评估任务,实现了最先进的性能,并超越现有神经与符号基线方法。