End-to-End Entity Detection with Proposer and Regressor

Named entity recognition is a traditional task in natural language processing. In particular, nested entity recognition receives extensive attention for the widespread existence of the nesting scenario. The latest research migrates the well-established paradigm of set prediction in object detection to cope with entity nesting. However, the manual creation of query vectors, which fail to adapt to the rich semantic information in the context, limits these approaches. An end-to-end entity detection approach with proposer and regressor is presented in this paper to tackle the issues. First, the proposer utilizes the feature pyramid network to generate high-quality entity proposals. Then, the regressor refines the proposals for generating the final prediction. The model adopts encoder-only architecture and thus obtains the advantages of the richness of query semantics, high precision of entity localization, and easiness of model training. Moreover, we introduce the novel spatially modulated attention and progressive refinement for further improvement. Extensive experiments demonstrate that our model achieves advanced performance in flat and nested NER, achieving a new state-of-the-art F1 score of 80.74 on the GENIA dataset and 72.38 on the WeiboNER dataset.

翻译：命名实体识别是自然语言处理中的一项传统任务。特别地，由于嵌套场景的广泛存在，嵌套实体识别受到了广泛关注。最新研究借鉴了目标检测中成熟的集合预测范式来处理实体嵌套问题。然而，手动创建查询向量的方法无法适应上下文中丰富的语义信息，从而限制了这些方法的性能。本文提出了一种使用提议器和回归器的端到端实体检测方法来解决这些问题。首先，提议器利用特征金字塔网络生成高质量的实体提议。然后，回归器对提议进行优化以生成最终预测。该模型采用仅编码器架构，因此具有查询语义丰富、实体定位精度高以及模型训练简便等优势。此外，我们引入了新颖的空间调制注意力机制和渐进式优化以进一步提升性能。大量实验表明，我们的模型在平面和嵌套命名实体识别中均达到了先进水平，在GENIA数据集上取得了80.74的新最高F1分数，在WeiboNER数据集上取得了72.38的新最高F1分数。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日