We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation (>40 million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers. We also find that answers can be obtained for a fraction of the compute cost, by changing the presentation method. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs \emph{without coding knowledge}. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
翻译:我们提出了QSTN,一个用于系统化生成问卷式提示响应的开源Python框架,旨在支持基于大型语言模型(LLMs)的计算机模拟调查与标注任务。QSTN能够对问卷呈现方式、提示扰动及响应生成方法进行稳健评估。我们的大规模评估(超过4000万条调查响应)表明,问题结构与响应生成方法对生成调查答案与人类回答的一致性具有显著影响。同时发现,通过改变呈现方式,可以仅需少量计算成本即可获得答案。此外,我们提供了无需编程知识的可视化用户界面,使研究人员能够无需编码即可开展基于LLMs的稳健实验。我们希望QSTN未来能够促进基于LLM研究的可复现性与可靠性。