Constraining Linear-chain CRFs to Regular Languages

A major challenge in structured prediction is to represent the interdependencies within output structures. When outputs are structured as sequences, linear-chain conditional random fields (CRFs) are a widely used model class which can learn \textit{local} dependencies in the output. However, the CRF's Markov assumption makes it impossible for CRFs to represent distributions with \textit{nonlocal} dependencies, and standard CRFs are unable to respect nonlocal constraints of the data (such as global arity constraints on output labels). We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones, by specifying the space of possible output structures as a regular language $\mathcal{L}$. The resulting regular-constrained CRF (RegCCRF) has the same formal properties as a standard CRF, but assigns zero probability to all label sequences not in $\mathcal{L}$. Notably, RegCCRFs can incorporate their constraints during training, while related models only enforce constraints during decoding. We prove that constrained training is never worse than constrained decoding, and show empirically that it can be substantially better in practice. Additionally, we demonstrate a practical benefit on downstream tasks by incorporating a RegCCRF into a deep neural model for semantic role labeling, exceeding state-of-the-art results on a standard dataset.

翻译：结构化预测中的一个主要挑战在于表示输出结构内部的相互依赖关系。当输出被组织为序列时，线性链条件随机场（CRF）是一种广泛使用的模型类别，能够学习输出中的\textit{局部}依赖关系。然而，CRF的马尔可夫假设使其无法表示具有\textit{非局部}依赖关系的分布，且标准CRF无法遵从数据的非局部约束（如对输出标签的全局元数约束）。我们提出一种CRF的泛化形式，通过将可能输出结构的空间指定为正则语言$\mathcal{L}$，能够强制实施包括非局部约束在内的广泛约束类别。由此得到的正则约束CRF（RegCCRF）具有与标准CRF相同的形式化性质，但会将所有不在$\mathcal{L}$中的标签序列赋予零概率。值得注意的是，RegCCRF能够在训练过程中纳入其约束，而相关模型仅在解码阶段执行约束。我们证明约束训练的效果永远不会劣于约束解码，并通过实验表明它在实践中可能显著更优。此外，通过将RegCCRF融入用于语义角色标注的深度神经模型，我们在下游任务中展示了其实际效益，并在标准数据集上超越了现有最优结果。

相关内容

条件随机场

关注 341

条件随机域（场）（conditional random fields，简称 CRF，或CRFs），是一种判别式概率模型，是随机场的一种，常用于标注或分析序列资料，如自然语言文字或是生物序列。如同马尔可夫随机场，条件随机场为具有无向的图模型，图中的顶点代表随机变量，顶点间的连线代表随机变量间的相依关系，在条件随机场中，随机变量 Y 的分布为条件机率，给定的观察值则为随机变量 X。原则上，条件随机场的图模型布局是可以任意给定的，一般常用的布局是链结式的架构，链结式架构不论在训练（training）、推论（inference）、或是解码（decoding）上，都存在效率较高的算法可供演算。

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日