人类与人工智能生成测试的对比：潜在特质评估中的维度性与信息准确性 (Human- vs. AI-generated tests: dimensionality and information accuracy in latent trait evaluation)

from arxiv, 28 pages, 12 figures. Minor corrections and comments added. The final version of this preprint will be published in Statistics, with the following DOI: 10.1080/02331888.2025.2610647

Artificial Intelligence (AI) and large language models (LLMs) are increasingly used in social and psychological research. Among potential applications, LLMs can be used to generate, customise, or adapt measurement instruments. This study presents a preliminary investigation of AI-generated questionnaires by comparing two ChatGPT-based adaptations of the Body Awareness Questionnaire (BAQ) with the validated human-developed version. The AI instruments were designed with different levels of explicitness in content and instructions on construct facets, and their psychometric properties were assessed using a Bayesian Graded Response Model. Results show that although surface wording between AI and original items was similar, differences emerged in dimensionality and in the distribution of item and test information across latent traits. These findings illustrate the importance of applying statistical measures of accuracy to ensure the validity and interpretability of AI-driven tools.

翻译：人工智能（AI）与大型语言模型（LLM）在社会科学与心理学研究中日益普及。在众多潜在应用中，LLM可用于生成、定制或改编测量工具。本研究通过比较两个基于ChatGPT改编的身体意识问卷（BAQ）与经过验证的人类开发版本，对AI生成的问卷进行了初步探索。AI工具在设计时对结构维度的内容和指令设置了不同的明确程度，并采用贝叶斯分级反应模型评估其心理测量学特性。结果表明，尽管AI项目与原始项目在表面措辞上相似，但在维度性以及项目与测试信息在潜在特质上的分布方面存在差异。这些发现表明，应用统计准确性度量对于确保AI驱动工具的有效性和可解释性至关重要。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CUHK博士论文】大型语言模型的测试与评估：正确性、非有害性与公平性

专知会员服务

20+阅读 · 2025年1月26日

【新书】使用生成式人工智能进行软件测试

专知会员服务

44+阅读 · 2025年1月6日

揭示生成式人工智能 / 大型语言模型（LLMs）的军事潜力

专知会员服务

31+阅读 · 2024年9月26日

【博士论文】大语言模型的测试与评价：准确性、无害性和公平性，223页pdf

专知会员服务

38+阅读 · 2024年9月16日