The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.
翻译:推导了深度神经网络实现指定输出变化所需的最小范数权重扰动,并讨论了决定其大小的因素。这些单层精确公式与基于多层Lipschitz常数的通用鲁棒性保证进行了对比;两者被观察到具有相同量级,表明其保证效能相似。这些结果被应用于精度修改激活的后门攻击,建立了可证明的压缩阈值,低于该阈值此类攻击无法成功,并通过实验证明低秩压缩能够可靠地激活潜在后门,同时保持全精度准确率。这些表达式揭示了反向传播的边际如何控制逐层敏感性,并为符合期望输出变化的最小参数更新提供了可验证的保证。