In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.
翻译:在近期诸多研究中,由于经验证据表明寻找更平坦极值点有助于提升神经网络损失优化中多个数据集的泛化性能,设计此类算法的关注度日益增加。本文通过过参数化模型中的数据记忆视角剖析了这些性能提升的机理。我们定义了一种新指标,用以识别相较于标准随机梯度下降法,追求更平坦极值点的算法在哪些具体数据点上表现更优。研究发现,锐度感知最小化(SAM)实现的泛化提升在非典型数据点上尤为显著,而此类数据需依赖记忆机制。这一发现揭示了SAM与更高隐私风险之间的关联,并通过详尽的实证评估加以验证。最后,我们提出了缓解策略,以在准确率与隐私保护之间实现更优的权衡。