Sentence embeddings induced with various transformer architectures encode much semantic and syntactic information in a distributed manner in a one-dimensional array. We investigate whether specific grammatical information can be accessed in these distributed representations. Using data from a task developed to test rule-like generalizations, our experiments on detecting subject-verb agreement yield several promising results. First, we show that while the usual sentence representations encoded as one-dimensional arrays do not easily support extraction of rule-like regularities, a two-dimensional reshaping of these vectors allows various learning architectures to access such information. Next, we show that various architectures can detect patterns in these two-dimensional reshaped sentence embeddings and successfully learn a model based on smaller amounts of simpler training data, which performs well on more complex test data. This indicates that current sentence embeddings contain information that is regularly distributed, and which can be captured when the embeddings are reshaped into higher dimensional arrays. Our results cast light on representations produced by language models and help move towards developing few-shot learning approaches.
翻译:通过不同Transformer架构生成的句子嵌入以分布式方式将大量语义和句法信息编码在一维数组中。我们研究了是否可以在这些分布式表示中访问特定的语法信息。利用为测试类规则泛化能力而设计的任务数据,我们对主谓一致检测的实验得出了若干有前景的结果。首先,我们证明:虽然通常编码为一维数组的句子表示难以支持规则类规律的提取,但这些向量的二维重塑使各种学习架构能够访问此类信息。其次,我们展示了不同架构可以检测这些二维重塑句子嵌入中的模式,并基于少量更简单的训练数据成功学习模型,该模型在更复杂的测试数据上表现良好。这表明当前句子嵌入包含规则分布的信息,且当嵌入被重塑为更高维数组时这些信息可被捕获。我们的研究揭示了语言模型产生的表征,并有助于推动小样本学习方法的发展。