Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.

翻译：句法结构在自然语言处理中曾扮演关键角色，但自深度学习革命以来，NLP领域逐渐被设计中不考虑句法结构的神经模型主导。其中一类极为成功的神经模型是Transformer。当用作编码器时，Transformer能生成输入句子中单词的上下文表示。本研究提出一种新的上下文词表示模型，它并非源于神经视角，而是纯粹从句法和概率角度出发。具体而言，我们设计了一个条件随机场，用于建模句子中所有单词的离散潜在表示及其间的依存弧，并采用平均场变分推断进行近似推理。引人注目的是，我们的模型计算图与Transformer高度相似：依存关系对应自注意力机制，潜在表示分布对应单词的上下文嵌入。实验表明，在中小型数据集上，我们的模型性能与Transformer相当。我们希望这项工作能弥合传统句法概率方法与前沿神经方法之间的鸿沟，并为未来更多基于语言学原理的神经方法提供启发。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日