Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Neural machine learning models, including the now ubiquitous Transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: \textit{modularity}, \textit{composition}, and \textit{abstraction}, in the form of grammar rules, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input language and a parser to generate a parse in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same parse are always processed with the same composed modules, while those with different parses may be processed with different modules. CPG learns parameters for the modules and is able to learn the semantics for new rules and types incrementally, without forgetting or retraining on rules it's already seen. It achieves perfect generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS -- state-of-the-art accuracy with a 1000x improvement in sample efficiency.
翻译:组合泛化是人类的一项关键能力,使我们能够仅通过少量示例学习新概念。神经网络机器学习模型(包括如今无处不在的Transformer)难以以这种方式泛化,通常在训练期间需要数千个概念示例才能实现有意义的泛化。人类与人工神经架构之间的能力差异,促使我们对一种名为组合式程序生成器(CPG)的神经符号架构进行研究。CPG具有三个关键特征:以语法规则形式实现的**模块化**、**组合性**和**抽象性**,这使其能够以少样本方式系统性泛化到新概念,并在多种序列到序列语言任务中按长度有效泛化。对于每个输入,CPG使用输入语言的语法和解析器生成解析树,其中每条语法规则都分配有独特的语义模块——一种概率性复制或替换程序。相同解析的实例始终使用相同的组合模块处理,而不同解析的实例可能使用不同模块处理。CPG学习模块的参数,并能增量式学习新规则和类型的语义,而不会遗忘或重新训练已见过的规则。在SCAN和COGS基准测试中,它仅使用SCAN的14个示例和COGS的22个示例即实现完美泛化——达到最先进水平,采样效率提升1000倍。