LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

Reza Yousefi Maragheh,Chenhao Fang,Charan Chand Irugu,Parth Parikh,Jason Cho,Jianpeng Xu,Saranyan Sukumar,Malay Patel,Evren Korpeoglu,Sushant Kumar,Kannan Achan

Keyword extraction is one of the core tasks in natural language processing. Classic extraction models are notorious for having a short attention span which make it hard for them to conclude relational connections among the words and sentences that are far from each other. This, in turn, makes their usage prohibitive for generating keywords that are inferred from the context of the whole text. In this paper, we explore using Large Language Models (LLMs) in generating keywords for items that are inferred from the items textual metadata. Our modeling framework includes several stages to fine grain the results by avoiding outputting keywords that are non informative or sensitive and reduce hallucinations common in LLM. We call our LLM-based framework Theme-Aware Keyword Extraction (LLM TAKE). We propose two variations of framework for generating extractive and abstractive themes for products in an E commerce setting. We perform an extensive set of experiments on three real data sets and show that our modeling framework can enhance accuracy based and diversity based metrics when compared with benchmark models.

翻译：关键词提取是自然语言处理中的核心任务之一。经典提取模型以其较短的注意力范围而著称，这使得它们难以建立相距较远的单词与句子之间的关联关系。这进而限制了它们在生成需从全文语境推断的关键词时的实用性。本文探索利用大语言模型从项目的元数据中推断并生成关键词。我们的建模框架包含多个阶段，通过避免输出无信息量或敏感的关键词，并减少大语言模型中常见的幻觉现象，对结果进行精细化调整。我们将该基于大语言模型的框架称为主题感知关键词提取。我们提出了该框架的两种变体，分别用于生成电商场景下产品的抽取式主题与抽象式主题。我们在三个真实数据集上进行了大量实验，结果表明，与基准模型相比，我们的建模框架能够基于准确性和多样性指标提升性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日