Language models have become nearly ubiquitous in natural language processing applications achieving state-of-the-art results in many tasks including prosody. As the model design does not define predetermined linguistic targets during training but rather aims at learning generalized representations of the language, analyzing and interpreting the representations that models implicitly capture is important in bridging the gap between interpretability and model performance. Several studies have explored the linguistic information that models capture providing some insights on their representational capacity. However, the current studies have not explored whether prosody is part of the structural information of the language that models learn. In this work, we perform a series of experiments on BERT probing the representations captured at different layers. Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.
翻译:语言模型在自然语言处理应用中已近乎无处不在,在包括韵律在内的多项任务中取得了最先进的成果。由于模型设计在训练过程中并未预设特定的语言目标,而是旨在学习语言的泛化表征,因此分析和解释模型隐式捕获的表征对于弥合可解释性与模型性能之间的鸿沟至关重要。已有若干研究探讨了模型捕获的语言信息,为其表征能力提供了一些见解。然而,当前研究尚未探究韵律是否属于模型所学习到的语言结构信息的一部分。在本工作中,我们针对BERT设计了一系列实验,探究其不同层所捕获的表征。结果表明,关于韵律突显的信息分布在多个层中,但主要集中在中层,这表明BERT主要依赖句法和语义信息。