We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.
翻译:我们研究了一类数据由私有特征和公共特征联结而成的私有学习问题。这在推荐系统或广告预测等个性化隐私任务中很常见——其中与个体相关的特征具有敏感性,而与物品(待推荐的电影或歌曲,或向用户展示的广告)相关的特征则为公开可用且无需保护。一个自然的问题是:私有算法能否在公共特征存在时实现更高的效用?我们对多编码器模型给出了肯定答案,其中部分编码器专门处理公共特征。我们开发了新算法,通过仅保护特定充分统计量(而非向梯度添加噪声)来利用这种分离特性。该方法能在线性回归中确保效用提升,更重要的是,在两个标准私有推荐基准测试中达到了当前最优水平,这充分体现了适应私有-公共特征分离方法的重要性。