It is commonplace to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, results obtained from these models with the use of sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by providing mathematically provable bounds on the privacy loss incurred when releasing information from sensitive data. Previous work has primarily concentrated on applying DP to unweighted ERM. We consider an important generalization to weighted ERM (wERM). In wERM, each individual's contribution to the objective function can be assigned varying weights. In this context, we propose the first differentially private wERM algorithm, backed by a rigorous theoretical proof of its DP guarantees under mild regularity conditions. Extending the existing DP-ERM procedures to wERM paves a path to deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome weighted learning (OWL). We evaluate the performance of the DP-wERM application to OWL in a simulation study and in a real clinical trial of melatonin for sleep health. All empirical results demonstrate the viability of training OWL models via wERM with DP guarantees while maintaining sufficiently useful model performance. Therefore, we recommend practitioners consider implementing the proposed privacy-preserving OWL procedure in real-world scenarios involving sensitive data.
翻译:在经验风险最小化(ERM)框架下,使用包含个人信息的数据构建预测模型已十分普遍。尽管这些模型能实现高精度预测,但使用敏感数据获得的模型结果容易遭受隐私攻击。差分隐私(DP)为解决此类数据隐私问题提供了有吸引力的框架,它能够对敏感数据信息泄露带来的隐私损失给出数学可证明的界限。以往研究主要集中于将DP应用于无加权ERM。本文考虑加权ERM(wERM)这一重要推广情形。在wERM中,每个个体对目标函数的贡献可被赋予不同权重。在此背景下,我们提出首个差分隐私wERM算法,并在温和正则条件下对其DP保障给出严格理论证明。将现有DP-ERM过程扩展至wERM,为推导个性化治疗规则的隐私保护学习方法开辟了道路,其中包括流行的加权结果学习(OWL)。我们通过模拟研究和一项关于褪黑素睡眠健康的真实临床试验,评估了DP-wERM应用于OWL的性能。所有实证结果均表明,通过wERM训练OWL模型时,能够在维持足够有效模型性能的前提下实现DP保障。因此,我们建议实践者在涉及敏感数据的现实场景中考虑实施所提出的隐私保护OWL过程。