Human- vs. AI-generated tests: dimensionality and information accuracy in latent trait evaluation

from arxiv, 28 pages, 12 figures. Minor corrections and comments added. The published version of this preprint is available in "Statistics" at the following DOI: 10.1080/02331888.2025.2610647

Artificial Intelligence (AI) and large language models (LLMs) are increasingly used in social and psychological research. Among potential applications, LLMs can be used to generate, customise, or adapt measurement instruments. This study presents a preliminary investigation of AI-generated questionnaires by comparing two ChatGPT-based adaptations of the Body Awareness Questionnaire (BAQ) with the validated human-developed version. The AI instruments were designed with different levels of explicitness in content and instructions on construct facets, and their psychometric properties were assessed using a Bayesian Graded Response Model. Results show that although surface wording between AI and original items was similar, differences emerged in dimensionality and in the distribution of item and test information across latent traits. These findings illustrate the importance of applying statistical measures of accuracy to ensure the validity and interpretability of AI-driven tools.

翻译：人工智能（AI）与大型语言模型（LLM）在社会科学与心理学研究中正得到日益广泛的应用。在众多潜在应用中，LLM可用于生成、定制或改编测量工具。本研究通过比较两个基于ChatGPT改编的身体意识问卷（BAQ）版本与经过验证的人类开发版本，对AI生成问卷进行了初步探究。AI工具在设计时对构念维度的内容与指导语设置了不同明确程度，并采用贝叶斯等级反应模型评估了其心理测量学特性。结果表明，尽管AI生成条目与原始条目的表面措辞相似，但在维度性以及条目与测验信息在潜在特质上的分布方面存在差异。这些发现说明，应用统计精度指标对于确保AI驱动工具的有效性与可解释性至关重要。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【新书】使用生成式人工智能进行软件测试

专知会员服务

45+阅读 · 2025年1月6日

揭示生成式人工智能 / 大型语言模型（LLMs）的军事潜力

专知会员服务

32+阅读 · 2024年9月26日

【博士论文】大语言模型的测试与评价：准确性、无害性和公平性，223页pdf

专知会员服务

38+阅读 · 2024年9月16日

生成式人工智能大型语言模型的安全性：概述

专知会员服务

35+阅读 · 2024年7月30日