Semantic role labeling (SRL) has multiple disjoint label sets, e.g., VerbNet and PropBank. Creating these datasets is challenging, therefore a natural question is how to use each one to help the other. Prior work has shown that cross-task interaction helps, but only explored multitask learning so far. A common issue with multi-task setup is that argument sequences are still separately decoded, running the risk of generating structurally inconsistent label sequences (as per lexicons like Semlink). In this paper, we eliminate such issue with a framework that jointly models VerbNet and PropBank labels as one sequence. In this setup, we show that enforcing Semlink constraints during decoding constantly improves the overall F1. With special input constructions, our joint model infers VerbNet arguments from given PropBank arguments with over 99 F1. For learning, we propose a constrained marginal model that learns with knowledge defined in Semlink to further benefit from the large amounts of PropBank-only data. On the joint benchmark based on CoNLL05, our models achieve state-of-the-art F1's, outperforming the prior best in-domain model by 3.5 (VerbNet) and 0.8 (PropBank). For out-of-domain generalization, our models surpass the prior best by 3.4 (VerbNet) and 0.2 (PropBank).
翻译:语义角色标注(SRL)存在多个互不重叠的标签集,例如VerbNet和PropBank。创建这些数据集颇具挑战性,因此一个自然的问题是:如何利用每个数据集辅助另一个?先前研究表明跨任务交互有益,但迄今仅探索了多任务学习。多任务设置的一个常见问题是论元序列仍被单独解码,存在生成结构不一致标签序列(依据Semlink等词汇资源)的风险。本文提出一种框架,将VerbNet和PropBank标签联合建模为单一序列,从而消除该问题。在此设置中,我们证明在解码时强制施加Semlink约束能持续提升整体F1值。通过特殊输入构造,我们的联合模型能从给定PropBank论元推断VerbNet论元,F1值超过99。在学习方面,我们提出一种约束边际模型,利用Semlink中定义的知识进行学习,从而进一步受益于大量仅含PropBank标注的数据。在基于CoNLL05的联合基准测试中,我们的模型实现了最先进的F1值,在VerbNet任务上超过先前最优域内模型3.5分,在PropBank上超过0.8分。对于跨域泛化,我们的模型在VerbNet上超越先前最优3.4分,在PropBank上超越0.2分。