In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets. Many existing approaches overcome this limitation with model architectures that enforce a compositional process of sentence interpretation. In this paper, we present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models. Informally, we prove that whenever a task can be solved by a compositional model, there is a corresponding data augmentation scheme -- a procedure for transforming examples into other well formed examples -- that imparts compositional inductive bias on any model trained to solve the same task. We describe a procedure called LEXSYM that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models. Unlike existing compositional data augmentation procedures, LEXSYM can be deployed agnostically across text, structured data, and even images. It matches or surpasses state-of-the-art, task-specific models on COGS semantic parsing, SCAN and ALCHEMY instruction following, and CLEVR-COGENT visual question answering datasets.
翻译:在语义解析、指令遵循和问答等任务中,标准深度网络难以从小数据集上进行组合泛化。现有许多方法通过强制实施句子解释的组合过程的模型架构来克服这一局限。本文提出了一种领域通用且与模型无关的组合性形式化表述,将其视为数据分布对称性的约束而非模型约束。我们非正式地证明:只要某个任务可由组合模型求解,就必然存在一种对应的数据增强方案——将示例转化为其他格式良好示例的流程——该方案能为任何训练以求解同一任务的模型赋予组合归纳偏差。我们描述了名为LEXSYM的流程,它能自动发现这些变换,并将其应用于普通神经序列模型的训练数据。与现有组合数据增强流程不同,LEXSYM可跨文本、结构化数据乃至图像进行无关性部署。在COGS语义解析、SCAN与ALCHEMY指令遵循以及CLEVR-COGENT视觉问答数据集上,LEXSYM匹配或超越了当前最优的特定任务模型。