Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the overall model performance. In this paper, we propose WALM (Word Agreement with Language Model), a new evaluation method for topic modeling that considers the semantic quality of document representations and topics in a joint manner, leveraging the power of Large Language Models (LLMs). With extensive experiments involving different types of topic models, WALM is shown to align with human judgment and can serve as a complementary evaluation method to the existing ones, bringing a new perspective to topic modeling. Our software package is available at https://github.com/Xiaohao-Yang/Topic_Model_Evaluation.
翻译:主题建模已成为无监督文本分析中广泛使用的工具。然而,对主题模型进行全面评估仍具挑战性。现有评估方法要么在不同模型间可比性较差(如困惑度),要么每次仅关注模型的单一特定方面(如主题质量或文档表示质量),不足以反映模型的整体性能。本文提出WALM(基于语言模型的词汇一致性),这是一种新的主题建模评估方法,它借助大型语言模型(LLMs)的能力,以联合方式考量文档表示与主题的语义质量。通过对不同类型主题模型的大量实验表明,WALM与人类判断具有良好一致性,可作为现有评估方法的补充手段,为主题建模研究提供新视角。我们的软件包发布于https://github.com/Xiaohao-Yang/Topic_Model_Evaluation。