Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.
翻译:文本内容的定性分析通过为数据分配标签,揭示了丰富而有价值的信息。然而,这一过程通常劳动密集型,尤其是在处理大型数据集时。尽管近期基于AI的工具展现了实用性,但研究人员可能缺乏现成的AI资源和专业知识,更不用说受到那些任务特定模型泛化能力有限的挑战。在本研究中,我们探索了使用大型语言模型(LLMs)支持演绎编码的方法——这是定性分析的一个主要类别,其中研究人员使用预定的编码本将数据标记为一组固定的编码。与训练任务特定模型不同,预训练的LLM可通过提示学习直接用于各种任务,无需微调。以好奇心驱动的问题编码任务为案例,我们发现,通过将GPT-3与专家起草的编码本相结合,我们提出的方法与专家编码结果达到了从尚可到显著的一致性。我们阐述了使用LLMs支持定性编码及其他相关领域的挑战与机遇。