Evaluating machine learning (ML) systems on their ability to learn known classifiers allows fine-grained examination of the patterns they can learn, which builds confidence when they are applied to the learning of unknown classifiers. This article presents a new benchmark for ML systems on sequence classification called MLRegTest, which contains training, development, and test sets from 1,800 regular languages. Different kinds of formal languages represent different kinds of long-distance dependencies, and correctly identifying long-distance dependencies in sequences is a known challenge for ML systems to generalize successfully. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies. Finally, the performance of different neural networks (simple RNN, LSTM, GRU, transformer) on MLRegTest is examined. The main conclusion is that their performance depends significantly on the kind of test set, the class of language, and the neural network architecture.
翻译:评估机器学习系统学习已知分类器的能力,可以对其能够学习的模式进行精细分析,从而增强将其应用于未知分类器学习时的信心。本文提出了一个名为MLRegTest的新基准,用于评估机器学习系统在序列分类任务中的表现,该基准包含来自1800种正则语言的训练集、开发集和测试集。不同类型的形式语言代表了不同类型的长期依赖关系,而正确识别序列中的长期依赖关系是机器学习系统成功泛化的一个已知挑战。MLRegTest根据语言的逻辑复杂度(一元二阶逻辑、一阶逻辑、命题逻辑或单项式表达式)以及逻辑文字的类型(字符串、分层字符串、子序列或其组合)对语言进行组织。逻辑复杂度和文字选择为理解正则语言中不同类型的长期依赖关系提供了系统化途径,进而有助于理解不同机器学习系统学习此类长期依赖关系的能力。最后,本文考察了不同神经网络(简单RNN、LSTM、GRU、Transformer)在MLRegTest上的表现。主要结论是,这些网络的性能显著依赖于测试集的类型、语言类别以及神经网络架构。