Analyzing textual data is the cornerstone of qualitative research. While traditional methods such as grounded theory and content analysis are widely used, they are labor-intensive and time-consuming. Topic modeling offers an automated complement. Yet, existing approaches, including LLM-based topic modeling, still struggle with issues such as high data preprocessing requirements, interpretability, and reliability. This paper introduces Agentic Retrieval-Augmented Generation (Agentic RAG) as a method for topic modeling with LLMs. It integrates three key components: (1) retrieval, enabling automatized access to external data beyond an LLM's pre-trained knowledge; (2) generation, leveraging LLM capabilities for text synthesis; and (3) agent-driven learning, iteratively refining retrieval and query formulation processes. To empirically validate Agentic RAG for topic modeling, we reanalyze a Twitter/X dataset, previously examined by Mu et al. (2024a). Our findings demonstrate that the approach is more efficient, interpretable and at the same time achieves higher reliability and validity in comparison to the standard machine learning approach but also in comparison to LLM prompting for topic modeling. These results highlight Agentic RAG's ability to generate semantically relevant and reproducible topics, positioning it as a robust, scalable, and transparent alternative for AI-driven qualitative research in leadership, managerial, and organizational research.
翻译:文本数据分析是定性研究的基石。尽管传统方法如扎根理论和内容分析被广泛使用,但这些方法劳动密集且耗时。主题建模提供了一种自动化的补充手段。然而,现有方法(包括基于大语言模型的主题建模)仍面临数据预处理要求高、可解释性和可靠性等问题。本文引入代理式检索增强生成(Agentic RAG)作为大语言模型主题建模的一种方法。该方法整合了三个关键组成部分:(1) 检索:实现对大语言模型预训练知识之外的外部数据的自动化访问;(2) 生成:利用大语言模型的文本合成能力;(3) 代理驱动学习:迭代优化检索和查询构建过程。为实证验证Agentic RAG在主题建模中的效果,我们重新分析了Mu等人(2024a)先前研究过的Twitter/X数据集。研究结果表明,与标准机器学习方法以及大语言模型提示的主题建模相比,该方法效率更高、可解释性更强,同时实现了更高的信度和效度。这些结果凸显了Agentic RAG生成语义相关且可复现主题的能力,使其成为领导力、管理和组织研究中人工智能驱动定性研究的稳健、可扩展且透明的替代方案。