Artificial Intelligence (AI) and large language models (LLMs) are increasingly used in social and psychological research. Among potential applications, LLMs can be used to generate, customise, or adapt measurement instruments. This study presents a preliminary investigation of AI-generated questionnaires by comparing two ChatGPT-based adaptations of the Body Awareness Questionnaire (BAQ) with the validated human-developed version. The AI instruments were designed with different levels of explicitness in content and instructions on construct facets, and their psychometric properties were assessed using a Bayesian Graded Response Model. Results show that although surface wording between AI and original items was similar, differences emerged in dimensionality and in the distribution of item and test information across latent traits. These findings illustrate the importance of applying statistical measures of accuracy to ensure the validity and interpretability of AI-driven tools.
翻译:人工智能(AI)与大型语言模型(LLM)在社会科学与心理学研究中日益普及。在众多潜在应用中,LLM可用于生成、定制或改编测量工具。本研究通过比较两个基于ChatGPT改编的身体意识问卷(BAQ)与经过验证的人类开发版本,对AI生成的问卷进行了初步探索。AI工具在设计时对结构维度的内容和指令设置了不同的明确程度,并采用贝叶斯分级反应模型评估其心理测量学特性。结果表明,尽管AI项目与原始项目在表面措辞上相似,但在维度性以及项目与测试信息在潜在特质上的分布方面存在差异。这些发现表明,应用统计准确性度量对于确保AI驱动工具的有效性和可解释性至关重要。