How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method assigns a generalization score representing how well a model generalizes to unseen samples in inverse relation to the amount of data it was trained on. The benchmark includes languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^mc^{n+m}$, and Dyck-1 and 2. We evaluate selected architectures using the benchmark and find that networks trained with a Minimum Description Length objective (MDL) generalize better and using less data than networks trained using standard loss functions. The benchmark is available at https://github.com/taucompling/bliss.
翻译:神经网络究竟能泛化到什么程度?即使在泛化目标完全已知的语法归纳任务中,先前的研究仍未能明确回答这一问题——它们仅测试了训练集之外的十分有限范围,且采用了不同的成功标准。本文提出了一种基于完全定义形式语言的神经网络泛化度量方法。该方法针对给定的模型与形式语法,通过计算泛化得分来表征模型在未见样本上的泛化能力,且该得分与训练数据量成反比。基准测试涵盖$a^nb^n$、$a^nb^nc^n$、$a^nb^mc^{n+m}$以及Dyck-1和Dyck-2等语言。我们利用该基准评估了选定的网络架构,发现采用最小描述长度(MDL)目标训练的神经网络,相比于使用标准损失函数训练的神经网络,能以更少的数据实现更优的泛化。该基准测试工具可在https://github.com/taucompling/bliss获取。