We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This problem is key to measuring data **memorization** and **information leakage** in machine learning, and the **influence** of training data points on model predictions. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of information leakage using membership inference attacks. Our theoretical framework enables us to investigate the causes of information leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to optimize queries that disclose the most significant information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
翻译:我们提出了一种新的分析框架,用于量化机器学习算法在训练集中包含少量数据点后其输出分布的变化,这一概念我们定义为留一法可区分性(LOOD)。该问题对于衡量机器学习中的**数据记忆**与**信息泄露**以及训练数据点对模型预测的**影响**至关重要。我们展示了该方法如何拓展并完善现有关于训练数据记忆与隐私风险的经验性度量。通过使用高斯过程对机器学习算法的随机性进行建模,并利用成员推理攻击对信息泄露进行广泛实证分析,验证了LOOD的有效性。我们的理论框架使我们能够探究信息泄露的原因及其高发区域。例如,我们分析了激活函数对数据记忆的影响。此外,我们的方法能够优化在留一法设置下泄露训练数据最重要信息的查询。我们展示了如何利用最优查询实现对训练数据的精确**重构**。