Grounded theory offers deep insights from qualitative data, but its reliance on expert-intensive manual coding presents a major scalability bottleneck. Existing computational tools either fail on full automation or lack flexible schema construction. We introduce LOGOS, a novel, end-to-end framework that fully automates the grounded theory workflow, transforming raw text into a structured, hierarchical theory. LOGOS integrates LLM-driven coding, semantic clustering, graph reasoning, and a novel iterative refinement process to build highly reusable codebooks. To ensure fair comparison, we also introduce a principled 5-dimensional metric and a train-test split protocol for standardized, unbiased evaluation. Across five diverse corpora, LOGOS consistently outperforms strong baselines and achieves a remarkable average $80.4\%$ alignment with an expert-developed schema on complex datasets. LOGOS demonstrates a potential to democratize and scale qualitative research without sacrificing theoretical nuance.
翻译:扎根理论能够从质性数据中提供深刻洞见,但其对专家密集型人工编码的依赖构成了显著的可扩展性瓶颈。现有计算工具要么无法实现全自动化,要么缺乏灵活的图式构建能力。本文提出LOGOS,一种新颖的端到端框架,能够全自动化地执行扎根理论工作流,将原始文本转化为结构化的层次化理论。LOGOS集成了LLM驱动编码、语义聚类、图推理以及一种新颖的迭代精炼过程,以构建高度可复用的编码手册。为确保公平比较,我们还提出了一套严谨的五维度量指标及训练-测试分割协议,用于标准化、无偏见的评估。在五个不同语料库上的实验表明,LOGOS始终优于强基线模型,并在复杂数据集上实现了与专家构建图式平均$80.4\%$的对齐度。LOGOS展现出在不牺牲理论细微差异的前提下,实现质性研究民主化与规模化应用的潜力。