Span-based models are one of the most straightforward methods for named entity recognition (NER). Existing span-based NER systems shallowly aggregate the token representations to span representations. However, this typically results in significant ineffectiveness for long-span entities, a coupling between the representations of overlapping spans, and ultimately a performance degradation. In this study, we propose DSpERT (Deep Span Encoder Representations from Transformers), which comprises a standard Transformer and a span Transformer. The latter uses low-layered span representations as queries, and aggregates the token representations as keys and values, layer by layer from bottom to top. Thus, DSpERT produces span representations of deep semantics. With weight initialization from pretrained language models, DSpERT achieves performance higher than or competitive with recent state-of-the-art systems on eight NER benchmarks. Experimental results verify the importance of the depth for span representations, and show that DSpERT performs particularly well on long-span entities and nested structures. Further, the deep span representations are well structured and easily separable in the feature space.
翻译:跨度模型是命名实体识别(NER)中最直接的方法之一。现有的基于跨度的NER系统将词元表示浅层聚合为跨度表示,但这通常会导致长跨度实体的效果显著不佳、重叠跨度表示之间的耦合,以及最终的性能下降。在本研究中,我们提出DSpERT(基于Transformer的深度跨度编码器表示),它由一个标准Transformer和一个跨度Transformer组成。后者使用低层跨度表示作为查询,并从底层到顶层逐层聚合词元表示作为键和值。因此,DSpERT生成具有深层语义的跨度表示。借助预训练语言模型的权重初始化,DSpERT在八个NER基准测试中取得了优于或与最新系统相当的性能。实验结果验证了跨度表示深度的重要性,并表明DSpERT在长跨度实体和嵌套结构上表现尤为出色。此外,深度跨度表示在特征空间中结构良好且易于分离。