Standardized, validated questionnaires are vital tools in HCI research and healthcare, offering dependable self-report data. However, their repeated use in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose utilizing large language models (LLMs) to generate diverse questionnaire versions while retaining good psychometric properties. In a longitudinal study, participants engaged with our agent system and responded daily for two weeks to either a standardized depression questionnaire or one of two LLM-generated questionnaire variants, alongside a validated depression questionnaire. Psychometric testing revealed consistent covariation between the external criterion and the focal measure administered across the three conditions, demonstrating the reliability and validity of the LLM-generated variants. Participants found the repeated administration of the standardized questionnaire significantly more repetitive compared to the variants. Our findings highlight the potential of LLM-generated variants to invigorate questionnaires, fostering engagement and interest without compromising validity.
翻译:标准化、经过验证的问卷是人机交互研究和医疗领域的重要工具,可提供可靠的自陈报告数据。然而,在纵向或前-后测研究中重复使用这些问卷,可能引发受试者疲劳,通过反应偏差和降低回复率影响数据质量。我们提出利用大型语言模型生成多样化的问卷版本,同时保留良好的心理测量特性。在一项纵向研究中,参与者与我们的代理系统互动,连续两周每日填写一份标准化抑郁问卷或两份大型语言模型生成的问卷变体之一,同时附上一份经过验证的抑郁问卷。心理测量测试显示,外部标准与三个条件下施测的焦点测量之间具有一致的共变关系,证明了大型语言模型生成变体的信度和效度。参与者认为标准化问卷的重复施测相较于变体而言显著更显重复。我们的发现凸显了大型语言模型生成变体在注入问卷活力、提升参与度和兴趣而不损害效度方面的潜力。