We address the named entity omission - the drawback of many current abstractive text summarizers. We suggest a custom pretraining objective to enhance the model's attention on the named entities in a text. At first, the named entity recognition model RoBERTa is trained to determine named entities in the text. After that, this model is used to mask named entities in the text and the BART model is trained to reconstruct them. Next, the BART model is fine-tuned on the summarization task. Our experiments showed that this pretraining approach improves named entity inclusion precision and recall metrics.
翻译:我们针对当前许多生成式文本摘要系统存在的命名实体遗漏问题展开研究。提出一种定制化预训练目标,旨在增强模型对文本中命名实体的关注度。首先训练命名实体识别模型RoBERTa以确定文本中的命名实体,随后利用该模型对文本中的命名实体进行掩码处理,并训练BART模型对其进行重构。接着在摘要任务上对BART模型进行微调。实验表明,该预训练方法能有效提升命名实体包含的精确率与召回率指标。