The advancement of large language models (LLMs) has made it difficult to differentiate human-written text from AI-generated text. Several AI-text detectors have been developed in response, which typically utilize a fixed global threshold (e.g., $θ= 0.5$) to classify machine-generated text. However, one universal threshold could fail to account for distributional variations by subgroups. For example, when using a fixed threshold, detectors make more false positive errors on shorter human-written text, and more positive classifications of neurotic writing styles among long texts. These discrepancies can lead to misclassifications that disproportionately affect certain groups. We address this critical limitation by introducing FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors. We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy. FairOPT showed notable discrepancy mitigation across nine detectors and three heterogeneous datasets, and the remarkable mitigation of the minimax problem by decreasing overall discrepancy 27.4% across five metrics while minimally sacrificing accuracy by 0.005%. Our framework paves the way for more robust classification in AI-generated content detection via post-processing. We release our data, code, and project information at URL.
翻译:大型语言模型(LLM)的进步使得区分人类书写文本与AI生成文本变得困难。为此,已开发出多种AI文本检测器,这些检测器通常采用固定的全局阈值(例如 $θ= 0.5$)来分类机器生成的文本。然而,一个通用阈值可能无法适应不同子群体的分布差异。例如,当使用固定阈值时,检测器在较短的人类书写文本上会产生更多的假阳性错误,并且在长文本中对神经质写作风格有更多的阳性分类。这些差异可能导致误分类,并对某些群体造成不成比例的影响。我们通过引入FairOPT算法来解决这一关键局限,该算法用于概率型AI文本检测器的群体特定阈值优化。我们根据属性(如文本长度和写作风格)将数据划分为子群体,并实施FairOPT为每个群体学习决策阈值以减少差异。FairOPT在九种检测器和三个异构数据集上显示出显著的差异缓解效果,并通过将五项指标上的总体差异降低27.4%,同时仅以0.005%的微小代价牺牲准确性,显著缓解了极小极大问题。我们的框架通过后处理为AI生成内容检测中更鲁棒的分类铺平了道路。我们在URL上发布了我们的数据、代码和项目信息。