Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.
翻译:利用众包平台招募参与者的在线问卷因其便捷性和低成本而日益普及。基于人工智能的大型语言模型使恶意行为者能够轻松自动填写在线表格,包括为开放式任务生成有意义的文本。这些技术进步威胁到使用在线问卷的研究的数据质量。本研究测试了人工智能为在线研究目的生成的文本是否能被人类和自动AI检测系统识别。尽管人类能够以高于偶然水平的准确率(76%的准确率)正确识别文本作者身份,但其表现仍低于确保满意数据质量所需的标准。研究人员目前不得不依赖恶意行为者对利用开放式回答作为确保数据质量的有效工具缺乏兴趣。自动AI检测系统目前完全不可用。如果人工智能在提交回复中变得过于普遍,那么检测欺诈提交的成本将超过在线问卷的收益。个人注意力检查将不再是确保良好数据质量的充分工具。这一问题只能通过众包平台系统性地解决。它们不能依赖自动AI检测系统,目前尚不清楚它们如何能为其付费客户确保数据质量。