Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.
翻译:先前用于保护成员隐私的方法通常需要更新或重新训练神经网络中的所有权重,这不仅成本高昂,还可能导致不必要的性能损失,甚至加剧训练数据与非训练数据之间预测结果的不一致性。在本研究中,我们观察到三个关键发现:i) 隐私脆弱性仅存在于极少数的权重中;ii) 然而,这些权重中的大多数对模型性能具有关键影响;iii) 权重的重要性源于其位置而非具体数值。基于这些发现,为保护隐私,我们对关键权重进行评分,并采用权重回退而非丢弃神经元的方式,仅对这些权重进行微调。通过大量实验验证,该机制在多数情况下展现出卓越的抵御成员推理攻击的能力,同时保持了模型性能。