The deployment of Machine-Generated Text (MGT) detection systems necessitates processing sensitive user data, creating a fundamental conflict between authorship verification and privacy preservation. Standard anonymization techniques often disrupt linguistic fluency, while rigorous Differential Privacy (DP) mechanisms typically degrade the statistical signals required for accurate detection. To resolve this dilemma, we propose \textbf{DP-MGTD}, a framework incorporating an Adaptive Differentially Private Entity Sanitization algorithm. Our approach utilizes a two-stage mechanism that performs noisy frequency estimation and dynamically calibrates privacy budgets, applying Laplace and Exponential mechanisms to numerical and textual entities respectively. Crucially, we identify a counter-intuitive phenomenon where the application of DP noise amplifies the distinguishability between human and machine text by exposing distinct sensitivity patterns to perturbation. Extensive experiments on the MGTBench-2.0 dataset show that our method achieves near-perfect detection accuracy, significantly outperforming non-private baselines while satisfying strict privacy guarantees.
翻译:机器生成文本检测系统的部署需要处理敏感用户数据,这导致了作者身份验证与隐私保护之间的根本性冲突。标准的匿名化技术通常会破坏语言流畅性,而严格的差分隐私机制则通常会降低准确检测所需的统计信号。为解决这一困境,我们提出了 **DP-MGTD**,一个融合了自适应差分隐私实体脱敏算法的框架。我们的方法采用一种两阶段机制,该机制执行带噪频率估计并动态校准隐私预算,分别对数值型实体和文本型实体应用拉普拉斯机制和指数机制。至关重要的是,我们发现了一个反直觉的现象:差分隐私噪声的应用通过暴露人类文本与机器文本对扰动的不同敏感性模式,反而放大了二者之间的可区分性。在 MGTBench-2.0 数据集上进行的大量实验表明,我们的方法实现了接近完美的检测准确率,在满足严格隐私保证的同时,显著优于非隐私保护的基线方法。