Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks, leading researchers to use them for time and labor-intensive analyses. However, their capability to handle highly specialized and open-ended tasks in domains like policy studies remains in question. This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership. The study, conducted in two stages-Topic Discovery and Topic Assignment-integrates LLMs with expert annotators to observe the impact of LLM suggestions on what is usually human-only analysis. Results indicate that LLM-generated topic lists have significant overlap with human generated topic lists, with minor hiccups in missing document-specific topics. However, LLM suggestions may significantly improve task completion speed, but at the same time introduce anchoring bias, potentially affecting the depth and nuance of the analysis, raising a critical question about the trade-off between increased efficiency and the risk of biased analysis.
翻译:大语言模型(LLMs)已在多种分析任务中展现出接近人类水平的能力,这促使研究人员将其用于耗时费力的分析工作。然而,在处理政策研究等领域的高度专业化、开放式任务方面,其能力仍存疑问。本文通过一项聚焦人-LLM协作的结构化用户研究,探讨了LLMs在专业任务中的效率与准确性。该研究分为主题发现与主题分配两个阶段,将LLMs与专家标注者相结合,以观察LLM建议对通常仅由人类完成的分析工作所产生的影响。结果表明,LLM生成的主题列表与人类生成的主题列表存在显著重叠,仅在遗漏文档特定主题方面存在微小瑕疵。然而,LLM建议虽可能显著提升任务完成速度,但同时也引入了锚定偏差,可能影响分析的深度与细微差别,从而引发一个关键问题:在提升效率与面临分析偏倚风险之间应如何权衡。