As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process. A set of experiments are presented to show how this methodology can be put in practice.
翻译:随着大语言模型渗透到我们生活的方方面面,一个值得在LLM使用中加强审视的领域是科学研究。利用LLM进行研究数据的生成或分析正日益流行。但当此类应用充斥着临时决策与工程化解决方案时,我们需要关注它可能对研究本身、研究发现以及基于该研究的后续工作造成的影响。我们需要一种更科学的方法来在研究中使用LLM。尽管已有多种积极尝试支持更系统化的提示构建,但这些尝试往往更侧重于实现理想结果,而非以充分的透明度、客观性或严谨性产生可复现、可推广的知识。本文提出了一种受定性研究方法中编码表构建启发的新方法论。通过引入人类参与循环和多阶段验证流程,该方法论为更系统、客观且可信地应用LLM进行数据分析奠定了基础。具体而言,我们展示了研究人员如何通过严格的标记、审议和记录流程来消除主观性,为提示生成过程带来透明度和可复现性。本文还通过系列实验展示了该方法论的实际应用方式。