This paper presents a novel methodological framework for detecting and classifying latent constructs, including frames, narratives, and topics, from textual data using Open-Source Large Language Models (LLMs). The proposed hybrid approach combines automated summarization with human-in-the-loop validation to enhance the accuracy and interpretability of construct identification. By employing iterative sampling coupled with expert refinement, the framework guarantees methodological robustness and ensures conceptual precision. Applied to diverse data sets, including AI policy debates, newspaper articles on encryption, and the 20 Newsgroups data set, this approach demonstrates its versatility in systematically analyzing complex political discourses, media framing, and topic classification tasks.
翻译:本文提出了一种新颖的方法论框架,用于通过开源大型语言模型从文本数据中检测和分类潜在构念,包括框架、叙事和主题。所提出的混合方法将自动摘要与人在回路验证相结合,以提升构念识别的准确性和可解释性。通过采用迭代抽样与专家细化相结合的策略,该框架确保了方法论的稳健性并保障了概念的精确性。该方法应用于包括人工智能政策辩论、加密相关新闻报道以及20 Newsgroups数据集在内的多种数据集,展示了其在系统分析复杂政治话语、媒体框架构建以及主题分类任务中的广泛适用性。