Existing work on differentially private linear regression typically assumes that end users can precisely set data bounds or algorithmic hyperparameters. End users often struggle to meet these requirements without directly examining the data (and violating privacy). Recent work has attempted to develop solutions that shift these burdens from users to algorithms, but they struggle to provide utility as the feature dimension grows. This work extends these algorithms to higher-dimensional problems by introducing a differentially private feature selection method based on Kendall rank correlation. We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of ``plug-and-play'' private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user.
翻译:现有关于差分隐私线性回归的研究通常假设终端用户能够精确设定数据范围或算法超参数。然而,终端用户往往难以在不直接查看数据(从而违反隐私要求)的情况下满足这些条件。近期研究尝试开发能将此类负担从用户转移到算法的解决方案,但这些方法在特征维度增长时难以保证效用。本研究通过引入一种基于肯德尔秩相关的差分隐私特征选择方法,将上述算法扩展至更高维问题。我们针对特征服从正态分布的场景证明了效用保障,并在25个数据集上进行了实验。结果表明,在回归前增加该私有特征选择步骤,能以极小的隐私、计算或终端用户决策成本,显著拓展"即插即用型"私有线性回归算法的适用范围。