Keyphrase generation is a task of identifying a set of phrases that best repre-sent the main topics or themes of a given text. Keyphrases are dividend int pre-sent and absent keyphrases. Recent approaches utilizing sequence-to-sequence models show effectiveness on absent keyphrase generation. However, the per-formance is still limited due to the hardness of finding absent keyphrases. In this paper, we propose Keyphrase-Focused BART, which exploits the differ-ences between present and absent keyphrase generations, and performs fine-tuning of two separate BART models for present and absent keyphrases. We further show effective approaches of shuffling keyphrases and candidate keyphrase ranking. For absent keyphrases, our Keyphrase-Focused BART achieved new state-of-the-art score on F1@5 in two out of five keyphrase gen-eration benchmark datasets.
翻译:关键词生成任务旨在识别最能代表给定文本主要主题或议题的短语集合。关键词分为现有关键词和缺失关键词两类。近期基于序列到序列模型的方法在缺失关键词生成任务中展现出有效性,但由于缺失关键词的发现难度,其性能仍受限制。本文提出关键词聚焦BART模型,该模型利用现有关键词与缺失关键词生成的差异性,分别为现有关键词和缺失关键词微调两个独立的BART模型。我们进一步展示了关键词混洗策略和候选关键词排序的有效方法。针对缺失关键词,我们的关键词聚焦BART在五个关键词生成基准数据集的其中两个上,在F1@5指标上取得了新的最佳结果。