Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has become one of the major threats. Proprietary models may be protected by access controls and encryption. However, in reality, these measures can be compromised due to system breaches, query-based model extraction or a disgruntled insider. Security hardening of neural networks is also suffering from limits, for example, model watermarking is passive, cannot prevent the occurrence of piracy and not robust against transformations. To this end, we propose a native authentication mechanism, called AuthNet, which integrates authentication logic as part of the model without any additional structures. Our key insight is to reuse redundant neurons with low activation and embed authentication bits in an intermediate layer, called a gate layer. Then, AuthNet fine-tunes the layers after the gate layer to embed authentication logic so that only inputs with special secret key can trigger the correct logic of AuthNet. It exhibits two intuitive advantages. It provides the last line of defense, i.e., even being exfiltrated, the model is not usable as the adversary cannot generate valid inputs without the key. Moreover, the authentication logic is difficult to inspect and identify given millions or billions of neurons in the model. We theoretically demonstrate the high sensitivity of AuthNet to the secret key and its high confusion for unauthorized samples. AuthNet is compatible with any convolutional neural network, where our extensive evaluations show that AuthNet successfully achieves the goal in rejecting unauthenticated users (whose average accuracy drops to 22.03%) with a trivial accuracy decrease (1.18% on average) for legitimate users, and is robust against model transformation and adaptive attacks.
翻译:模型窃取,即未经授权访问和窃取深度学习模型,已成为主要威胁之一。专有模型可通过访问控制和加密进行保护。然而现实中,这些措施可能因系统漏洞、基于查询的模型提取或心怀不满的内部人员而失效。神经网络的安全加固也存在局限,例如模型水印是被动防御,无法防止盗版发生且对变换攻击不鲁棒。为此,我们提出一种原生认证机制AuthNet,将认证逻辑作为模型组成部分集成,无需额外结构。我们的核心思路是复用低激活冗余神经元,在中间门控层嵌入认证位。随后AuthNet对门控层后网络进行微调以嵌入认证逻辑,使得仅携带特定密钥的输入能触发正确逻辑。该方法具有两大直观优势:提供最后防线——即使模型被窃,攻击者无密钥也无法生成有效输入;同时认证逻辑在数百万乃至数十亿神经元中难以被检测识别。我们从理论上证明了AuthNet对密钥的高敏感性及对未授权样本的高混淆性。AuthNet兼容任何卷积神经网络,大量实验表明:该方法能以微小精度损失(合法用户平均下降1.18%)成功拒绝未授权用户(其平均精度降至22.03%),并对模型变换与自适应攻击保持鲁棒。