Random cropping is one of the most common data augmentation techniques in computer vision, yet the role of its inherent randomness in training differentially private machine learning models has thus far gone unexplored. We observe that when sensitive content in an image is spatially localized, such as a face or license plate, random cropping can probabilistically exclude that content from the model's input. This introduces a third source of stochasticity in differentially private training with stochastic gradient descent, in addition to gradient noise and minibatch sampling. This additional randomness amplifies differential privacy without requiring changes to model architecture or training procedure. We formalize this effect by introducing a patch-level neighboring relation for vision data and deriving tight privacy bounds for differentially private stochastic gradient descent (DP-SGD) when combined with random cropping. Our analysis quantifies the patch inclusion probability and shows how it composes with minibatch sampling to yield a lower effective sampling rate. Empirically, we validate that patch-level amplification improves the privacy-utility trade-off across multiple segmentation architectures and datasets. Our results demonstrate that aligning privacy accounting with domain structure and additional existing sources of randomness can yield stronger guarantees at no additional cost.
翻译:随机剪裁是计算机视觉中最常用的数据增强技术之一,但其内在随机性在差分隐私机器学习模型训练中的作用至今尚未被探索。我们注意到,当图像中的敏感内容(如人脸或车牌)在空间上局部化时,随机剪裁可以以概率方式将该内容排除在模型输入之外。这为基于随机梯度下降的差分隐私训练引入了第三种随机性来源——除梯度噪声和小批量采样之外。这种额外随机性在无需改变模型架构或训练流程的情况下放大了差分隐私。我们通过引入视觉数据的补丁级邻接关系,并推导结合随机剪裁的差分隐私随机梯度下降(DP-SGD)在严格隐私界限下的表现,从而形式化了这一效应。我们的分析量化了补丁包含概率,并展示了其如何与小批量采样协同作用,产生更低的有效采样率。在实证方面,我们验证了补丁级放大能改善多种分割架构和数据集上的隐私-效用权衡。研究结果表明,将隐私核算与领域结构及现有额外随机性源对齐,可在不增加额外成本的情况下提供更强的保证。