Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit .
翻译:大型语言模型正日益解决那些通常被认为需要人类水平推理能力的任务。然而,这些模型在通用智能基准测试(如抽象与推理语料库(ARC))上表现仍然很差。在本文中,我们将ARC视为示例编程问题,并提出了一种新颖且可扩展的语言模型自改进方法——代码迭代(CodeIt)。我们的方法在以下两个步骤之间迭代:1)程序采样与后视重标注,以及2)基于优先级经验回放的学习。通过将任务片段的目标(即给定输入的目标程序输出)重标注为采样程序实现的实际输出,我们的方法有效应对了程序合成中奖励极度稀疏的问题。将CodeIt应用于ARC数据集,我们证明了优先级后视回放结合预训练与数据增强,能够实现成功的任务间泛化。CodeIt是首个可扩展至完整ARC评估数据集的神经符号方法。我们的方法解决了15%的ARC评估任务,实现了最先进的性能,并超越了现有的神经与符号基线方法。代码发布于https://github.com/Qualcomm-AI-research/codeit。