The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.
翻译:文本生成能力的迅猛发展伴随着对机器生成文本检测兴趣的同步增长:即识别给定文本是由模型生成还是由人类撰写的技术能力。尽管检测模型表现出色,但它们有可能引发严重的负面影响。我们探讨了英语机器生成文本检测系统中潜在偏差。通过整理学生作文数据集,我们评估了16种不同检测系统在四个属性上的偏差:性别、种族/民族、英语学习者身份及经济状况。采用基于回归的模型评估这些属性的效应显著性与效力,并进行子群分析。研究发现,尽管偏差在不同系统中总体不一致,但存在若干关键问题:多个模型倾向于将弱势群体文本判定为机器生成;英语学习者的作文更易被归类为机器生成;经济弱势学生的作文则较少被判为机器生成;与非英语母语的白人学生相比,非白人英语学习者的作文被判定为机器生成的比例更高。最后,通过人工标注发现,尽管人类在检测任务中整体表现不佳,但所研究的属性未显示显著偏差。