Adversarial robustness, the ability of a model to withstand manipulated inputs that cause errors, is essential for ensuring the trustworthiness of machine learning models in real-world applications. However, previous studies have shown that enhancing adversarial robustness through adversarial training increases vulnerability to privacy attacks. While differential privacy can mitigate these attacks, it often compromises robustness against both natural and adversarial samples. Our analysis reveals that differential privacy disproportionately impacts low-risk samples, causing an unintended performance drop. To address this, we propose DeMem, which selectively targets high-risk samples, achieving a better balance between privacy protection and model robustness. DeMem is versatile and can be seamlessly integrated into various adversarial training techniques. Extensive evaluations across multiple training methods and datasets demonstrate that DeMem significantly reduces privacy leakage while maintaining robustness against both natural and adversarial samples. These results confirm DeMem's effectiveness and broad applicability in enhancing privacy without compromising robustness.
翻译:对抗鲁棒性,即模型抵御导致错误的操纵输入的能力,对于确保机器学习模型在现实应用中的可信度至关重要。然而,先前研究表明,通过对抗训练增强对抗鲁棒性会增加模型对隐私攻击的脆弱性。虽然差分隐私可以缓解此类攻击,但它通常会损害模型对自然样本和对抗样本的鲁棒性。我们的分析表明,差分隐私对低风险样本的影响尤为显著,导致非预期的性能下降。为解决这一问题,我们提出了DeMem,该方法选择性地针对高风险样本进行处理,从而在隐私保护和模型鲁棒性之间实现更好的平衡。DeMem具有通用性,可以无缝集成到各种对抗训练技术中。在多种训练方法和数据集上的广泛评估表明,DeMem在保持对自然样本和对抗样本鲁棒性的同时,显著减少了隐私泄露。这些结果证实了DeMem在增强隐私而不损害鲁棒性方面的有效性和广泛适用性。