Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases. Leveraging MetaKP, we design both supervised and unsupervised methods, including a multi-task fine-tuning approach and a self-consistency prompting method with large language models. The results highlight the challenges of supervised fine-tuning, whose performance is not robust to distribution shifts. By contrast, the proposed self-consistency prompting approach greatly improves the performance of large language models, enabling GPT-4o to achieve 0.548 SemF1, surpassing the performance of a fully fine-tuned BART-base model. Finally, we demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
翻译:传统的关键词预测方法为每篇文档预测单一的关键词集合,无法满足用户和下游应用的多样化需求。为弥补这一差距,我们引入了按需关键词生成这一新颖范式,该范式要求生成的关键词符合特定高层目标或意图。针对此任务,我们提出了MetaKP——一个包含新闻和生物医学领域四个数据集、7500篇文档及3760个目标的大规模基准,所有数据均含人工标注的关键词。基于MetaKP,我们设计了监督与无监督方法,包括多任务微调方法及基于大语言模型的自洽提示方法。实验结果凸显了监督微调的局限性:其性能对分布偏移缺乏鲁棒性。相比之下,我们提出的自洽提示方法显著提升了大语言模型的性能,使GPT-4o的SemF1分数达到0.548,超越了全参数微调的BART-base模型。最后,我们通过社交媒体疫情事件检测的案例,证明了该方法作为通用自然语言处理基础设施的潜力。