The training of contemporary deep learning models heavily relies on publicly available data, posing a risk of unauthorized access to online data and raising concerns about data privacy. Current approaches to creating unlearnable data involve incorporating small, specially designed noises, but these methods strictly limit data usability, overlooking its potential usage in authorized scenarios. In this paper, we extend the concept of unlearnable data to conditional data learnability and introduce \textbf{U}n\textbf{G}eneralizable \textbf{E}xamples (UGEs). UGEs exhibit learnability for authorized users while maintaining unlearnability for potential hackers. The protector defines the authorized network and optimizes UGEs to match the gradients of the original data and its ungeneralizable version, ensuring learnability. To prevent unauthorized learning, UGEs are trained by maximizing a designated distance loss in a common feature space. Additionally, to further safeguard the authorized side from potential attacks, we introduce additional undistillation optimization. Experimental results on multiple datasets and various networks demonstrate that the proposed UGEs framework preserves data usability while reducing training performance on hacker networks, even under different types of attacks.
翻译:当代深度学习模型的训练严重依赖公开可用数据,这带来了在线数据被未授权访问的风险,并引发了对数据隐私的担忧。当前创建不可学习数据的方法通常包含添加微小且专门设计的噪声,但这些方案严格限制了数据的可用性,忽视了其在授权场景中的潜在用途。本文我们将不可学习数据的概念扩展为条件数据可学习性,并提出**不**可**泛**化**示**例(UGEs)。UGEs对授权用户表现出可学习性,同时对潜在黑客保持不可学习性。保护者定义授权网络并通过优化UGEs使其梯度与原始数据及其不可泛化版本的梯度相匹配,从而确保可学习性。为防止未授权学习,UGEs通过在公共特征空间中最大化指定的距离损失进行训练。此外,为进一步保护授权方免受潜在攻击,我们引入了额外的去蒸馏优化。在多个数据集和不同网络上的实验结果表明,所提出的UGEs框架在降低黑客网络训练性能的同时保留了数据的可用性,即使在面临不同类型的攻击时也是如此。