As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process.
翻译:随着大型语言模型渗透到我们生活的诸多方面,科学研究中对其使用需加强审查。利用LLM生成或分析研究数据的做法正日益普及。但当此类应用夹杂着临时决策和工程解决方案时,我们需要警惕它可能对研究本身、研究发现,乃至基于该研究的未来工作产生的影响。我们需要以更科学的方式在研究中使用LLM。尽管目前已有若干积极举措支持更系统的提示构建,但这些努力往往更侧重于达成理想结果,而非以充分的透明度、客观性或严谨性产生可复制、可推广的知识。本文提出一种受质性研究方法中编码簿构建启发的新型方法论。通过引入人机协同和多阶段验证流程,该方法论为更系统、客观且可信地应用LLM分析数据奠定了基础。具体而言,我们展示了一组研究人员如何通过严谨的标注、研讨和文档记录流程,消除主观性,为提示生成过程带来透明度与可复制性。