We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We prove that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels (vs. 16 for the 4-bit encoding). We also extend optimal hierarchical bracketing to support arbitrary non-projectivity in a more compact way than previous encodings. Our new encodings yield competitive accuracy on a diverse set of treebanks.
翻译:本文提出了一族基于层次括号概念的序列标注依存句法分析编码方案。我们证明现有的4位投影编码属于该编码族,但其在编码树结构时使用的标签数量并非最优。我们推导出一种最优层次括号编码,该编码最小化符号使用量,仅需12个不同的标签即可编码投影树(而4位编码需要16个标签)。我们还将最优层次括号编码扩展至支持任意非投影结构,其编码方式较现有方案更为紧凑。在多样化树库上的实验表明,我们提出的新编码方案能够取得具有竞争力的解析准确率。