Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.
翻译:组合式泛化是人类的一项关键能力,使我们能够仅从少量示例中学习新概念。包括当前广泛应用的Transformer在内的机器学习模型,难以以这种方式实现泛化,通常需要在训练过程中接触数千个概念示例才能实现有意义的泛化。这种人类与人工神经架构之间的能力差异,促使我们研究一种名为"组合式程序生成器"(Compositional Program Generator, CPG)的神经符号架构。CPG具有三大核心特征:模块化、类型抽象和递归组合,使其能够以少样本方式系统性地泛化到新概念,同时也能在各种序列到序列的语言任务中通过长度实现高效泛化。对于每个输入,CPG利用输入域的语法和解析器生成类型层次结构,其中每个语法规则都被赋予其独特的语义模块(即概率性复制或替换程序)。具有相同层次结构的实例由相同的组合程序处理,而具有不同层次结构的实例则可能由不同程序处理。CPG通过学习语义模块的参数,并能够增量式地学习新类型的语义。给定输入语言的上下文无关语法,以及将源语言每个单词映射到输出语言解释的词典,CPG能够在SCAN和COGS基准测试的标准及极端少样本设置下实现完美泛化。