Artificial Intelligence (AI) and large language models (LLMs) are increasingly used in social and psychological research. Among potential applications, LLMs can be used to generate, customise, or adapt measurement instruments. This study presents a preliminary investigation of AI-generated questionnaires by comparing two ChatGPT-based adaptations of the Body Awareness Questionnaire (BAQ) with the validated human-developed version. The AI instruments were designed with different levels of explicitness in content and instructions on construct facets, and their psychometric properties were assessed using a Bayesian Graded Response Model. Results show that although surface wording between AI and original items was similar, differences emerged in dimensionality and in the distribution of item and test information across latent traits. These findings illustrate the importance of applying statistical measures of accuracy to ensure the validity and interpretability of AI-driven tools.
翻译:人工智能(AI)与大型语言模型(LLM)在社会科学与心理学研究中正得到日益广泛的应用。在众多潜在应用中,LLM可用于生成、定制或改编测量工具。本研究通过比较两个基于ChatGPT改编的身体意识问卷(BAQ)版本与经过验证的人类开发版本,对AI生成问卷进行了初步探究。AI工具在设计时对构念维度的内容与指导语设置了不同明确程度,并采用贝叶斯等级反应模型评估了其心理测量学特性。结果表明,尽管AI生成条目与原始条目的表面措辞相似,但在维度性以及条目与测验信息在潜在特质上的分布方面存在差异。这些发现说明,应用统计精度指标对于确保AI驱动工具的有效性与可解释性至关重要。