The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse.
翻译:生成式语言模型的迅速普及推动了数字通信领域的重大进步,同时也引发了人们对AI生成内容潜在滥用的担忧。尽管已有多种检测方法被提出用于区分AI生成与人类撰写的内容,但这些检测工具的公平性和鲁棒性仍缺乏充分研究。在本研究中,我们利用母语和非母语英语写作者的写作样本,评估了多种常用GPT检测工具的性能。研究结果显示,这些检测工具一致地将非母语英语写作者的样本误判为AI生成,而母语写作者的样本则能被准确识别。此外,我们证明简单的提示策略不仅能缓解这种偏见,还可以有效绕过GPT检测工具,这表明GPT检测工具可能会无意中惩罚语言表达受限的写作者。我们的研究结果呼吁对部署ChatGPT内容检测工具的伦理影响展开更广泛的讨论,并警示在评估或教育环境中使用这些工具可能带来的风险,尤其是当它们可能无意中惩罚或排斥非母语英语使用者参与全球对话时。