Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.
翻译:句法结构在自然语言处理中曾扮演关键角色,但自深度学习革命以来,NLP领域逐渐被设计中不考虑句法结构的神经模型主导。其中一类极为成功的神经模型是Transformer。当用作编码器时,Transformer能生成输入句子中单词的上下文表示。本研究提出一种新的上下文词表示模型,它并非源于神经视角,而是纯粹从句法和概率角度出发。具体而言,我们设计了一个条件随机场,用于建模句子中所有单词的离散潜在表示及其间的依存弧,并采用平均场变分推断进行近似推理。引人注目的是,我们的模型计算图与Transformer高度相似:依存关系对应自注意力机制,潜在表示分布对应单词的上下文嵌入。实验表明,在中小型数据集上,我们的模型性能与Transformer相当。我们希望这项工作能弥合传统句法概率方法与前沿神经方法之间的鸿沟,并为未来更多基于语言学原理的神经方法提供启发。