How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method assigns a generalization score representing how well a model generalizes to unseen samples in inverse relation to the amount of data it was trained on. The benchmark includes languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^mc^{n+m}$, and Dyck-1 and 2. We evaluate selected architectures using the benchmark and find that networks trained with a Minimum Description Length objective (MDL) generalize better and using less data than networks trained using standard loss functions. The benchmark is available at https://github.com/taucompling/bliss.
翻译:神经网络泛化能力如何?即使在目标泛化情况完全已知的语法归纳任务中,既有研究仍留有开放性问题——仅在训练集之外非常有限的范围内进行测试,并使用不同的成功标准。我们基于完全形式化的形式语言提出一种神经网络泛化度量方法。给定模型与形式语法,该方法通过计算泛化分数来评估模型泛化到未见样本的能力,该分数与模型训练数据量呈反比。基准测试涵盖$a^nb^n$、$a^nb^nc^n$、$a^nb^mc^{n+m}$以及Dyck-1和Dyck-2等语言。我们使用该基准测试对选定的架构进行评估,发现采用最小描述长度(MDL)目标训练的神经网络比使用标准损失函数训练的神经网络泛化能力更强且所需数据更少。该基准测试工具可从https://github.com/taucompling/bliss获取。