Many tasks can be described as compositions over subroutines. Though modern neural networks have achieved impressive performance on both vision and language tasks, we know little about the functions that they implement. One possibility is that neural networks implicitly break down complex tasks into subroutines, implement modular solutions to these subroutines, and compose them into an overall solution to a task -- a property we term structural compositionality. Or they may simply learn to match new inputs to memorized representations, eliding task decomposition entirely. Here, we leverage model pruning techniques to investigate this question in both vision and language, across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models oftentimes implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subroutines. This suggests that neural networks may be able to learn to exhibit compositionality, obviating the need for specialized symbolic mechanisms.
翻译:许多任务可以被描述为子程序的组合。尽管现代神经网络在视觉和语言任务上取得了令人瞩目的性能,但我们对其所实现的函数知之甚少。一种可能性是,神经网络会隐式地将复杂任务分解为子程序,为这些子程序实现模块化解决方案,并将它们组合成任务的整体解决方案——我们将这一特性称为结构组合性。或者,它们可能只是学习将新输入与记忆表征进行匹配,完全规避任务分解。本文利用模型剪枝技术,在视觉和语言领域、多种架构、任务及预训练方案下探究了这一问题。我们的结果表明,模型通常通过模块化子网络实现子程序的解决方案,这些子网络可以在被消融的同时保持其他子程序的功能。这表明神经网络或许能够学习展现出组合性,从而无需专门的符号机制。