Traditional topic models such as neural topic models rely on inference and generation networks to learn latent topic distributions. This paper explores a new paradigm for topic modeling in the era of large language models, framing TM as a long-form generation task whose definition is updated in this paradigm. We propose a simple but practical approach to implement LLM-based topic model tasks out of the box (sample a data subset, generate topics and representative text with our prompt, text assignment with keyword match). We then investigate whether the long-form generation paradigm can beat NTMs via zero-shot prompting. We conduct a systematic comparison between NTMs and LLMs in terms of topic quality and empirically examine the claim that "a majority of NTMs are outdated."
翻译:传统的主题模型(如神经主题模型)依赖推断与生成网络来学习潜在主题分布。本文探索了大语言模型时代下主题建模的新范式,将主题建模重新定义为长文本生成任务,并在此范式中更新其定义。我们提出了一种简单实用的方法,可直接实现基于大语言模型的主题建模任务(采样数据子集、通过提示生成主题及代表性文本、基于关键词匹配的文本分配)。随后,我们探究长文本生成范式能否通过零样本提示超越神经主题模型。我们系统比较了神经主题模型与大语言模型在主题质量上的表现,并通过实证检验了"多数神经主题模型已过时"这一论断。