Topic modeling is a research field finding increasing applications: historically from document retrieving, to sentiment analysis and text summarization. Large Language Models (LLM) are currently a major trend in text processing, but few works study their usefulness for this task. Formal Concept Analysis (FCA) has recently been presented as a candidate for topic modeling, but no real applied case study has been conducted. In this work, we compare LLM and FCA to better understand their strengths and weakneses in the topic modeling field. FCA is evaluated through the CREA pipeline used in past experiments on topic modeling and visualization, whereas GPT-5 is used for the LLM. A strategy based on three prompts is applied with GPT-5 in a zero-shot setup: topic generation from document batches, merging of batch results into final topics, and topic labeling. A first experiment reuses the teaching materials previously used to evaluate CREA, while a second experiment analyzes 40 research articles in information systems to compare the extracted topics with the underling subfields.
翻译:主题建模是一个应用日益广泛的研究领域:从历史上的文档检索,到情感分析和文本摘要。大语言模型(LLM)目前是文本处理的主要趋势,但很少有工作研究它们在此任务中的实用性。形式概念分析(FCA)最近被提出作为主题建模的候选方法,但尚未进行实际的应用案例研究。在本工作中,我们比较LLM和FCA,以更好地理解它们在主题建模领域的优势和劣势。FCA通过过去在主题建模和可视化实验中使用的CREA流程进行评估,而GPT-5则用于代表LLM。在零样本设置下,对GPT-5应用基于三个提示的策略:从文档批次生成主题、将批次结果合并为最终主题,以及主题标注。第一个实验复用了先前用于评估CREA的教学材料,而第二个实验分析了信息系统的40篇研究文章,以比较提取的主题与基础子领域。