In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.
翻译:本文考虑机器学习模型在更新数据集上重新训练以纳入最新信息或反映分布偏移的场景。我们探究是否能够推断出训练数据中这些更新的相关信息(例如记录属性值的变化)。在此场景中,攻击者能够获取数据集变化前后模型的快照。与现有文献不同,我们假设单个或多个训练数据点的属性值发生变化,而非整条数据记录的增删。我们提出基于原始模型与更新模型预测置信度差异的攻击方法。我们在两个公开数据集上结合多层感知机与逻辑回归模型评估了攻击方法。实验表明,相较于仅能访问更新后的模型,获取模型的两个快照会导致更高的信息泄露。此外,我们观察到具有稀有值的记录更容易受到攻击,这表明隐私攻击在更新场景中存在差异性脆弱性。当具有相同原始属性值的多条记录被更新为同一新值时(即重复变更),攻击者更有可能正确猜测更新后的值,因为重复变更会在训练模型上留下更大痕迹。这些观察结果表明,机器学习模型在更新场景中易受属性推断攻击。