A major challenge in structured prediction is to represent the interdependencies within output structures. When outputs are structured as sequences, linear-chain conditional random fields (CRFs) are a widely used model class which can learn \textit{local} dependencies in the output. However, the CRF's Markov assumption makes it impossible for CRFs to represent distributions with \textit{nonlocal} dependencies, and standard CRFs are unable to respect nonlocal constraints of the data (such as global arity constraints on output labels). We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones, by specifying the space of possible output structures as a regular language $\mathcal{L}$. The resulting regular-constrained CRF (RegCCRF) has the same formal properties as a standard CRF, but assigns zero probability to all label sequences not in $\mathcal{L}$. Notably, RegCCRFs can incorporate their constraints during training, while related models only enforce constraints during decoding. We prove that constrained training is never worse than constrained decoding, and show empirically that it can be substantially better in practice. Additionally, we demonstrate a practical benefit on downstream tasks by incorporating a RegCCRF into a deep neural model for semantic role labeling, exceeding state-of-the-art results on a standard dataset.
翻译:结构化预测中的一个主要挑战在于表示输出结构内部的相互依赖关系。当输出被组织为序列时,线性链条件随机场(CRF)是一种广泛使用的模型类别,能够学习输出中的\textit{局部}依赖关系。然而,CRF的马尔可夫假设使其无法表示具有\textit{非局部}依赖关系的分布,且标准CRF无法遵从数据的非局部约束(如对输出标签的全局元数约束)。我们提出一种CRF的泛化形式,通过将可能输出结构的空间指定为正则语言$\mathcal{L}$,能够强制实施包括非局部约束在内的广泛约束类别。由此得到的正则约束CRF(RegCCRF)具有与标准CRF相同的形式化性质,但会将所有不在$\mathcal{L}$中的标签序列赋予零概率。值得注意的是,RegCCRF能够在训练过程中纳入其约束,而相关模型仅在解码阶段执行约束。我们证明约束训练的效果永远不会劣于约束解码,并通过实验表明它在实践中可能显著更优。此外,通过将RegCCRF融入用于语义角色标注的深度神经模型,我们在下游任务中展示了其实际效益,并在标准数据集上超越了现有最优结果。