Recent work utilizes Large Language Models (LLMs) for topic modeling, generating comprehensible topic labels for given documents. However, their performance has mainly been evaluated qualitatively, and there remains room for quantitative investigation of their capabilities. In this paper, we quantitatively evaluate LLMs from multiple perspectives: the quality of topics, the impact of LLM-specific concerns, such as hallucination and shortcuts for limited documents, and LLMs' controllability of topic categories via prompts. Our findings show that LLMs can identify coherent and diverse topics with few hallucinations but may take shortcuts by focusing only on parts of documents. We also found that their controllability is limited.
翻译:近期研究利用大型语言模型(LLMs)进行主题建模,为给定文档生成可理解的主题标签。然而,其性能主要依赖定性评估,对其能力的定量研究仍有探索空间。本文从多个维度对LLMs进行定量评估:主题生成质量、LLM特有因素(如幻觉效应及对有限文档的捷径策略)的影响,以及通过提示词实现主题类别控制的能力。研究发现,LLMs能够生成连贯且多样化的主题,幻觉现象较少,但可能采取捷径策略而仅关注文档局部信息。同时发现其可控性存在局限。