A number of learning models used in consequential domains, such as to assist in legal, banking, hiring, and healthcare decisions, make use of potentially sensitive users' information to carry out inference. Further, the complete set of features is typically required to perform inference. This not only poses severe privacy risks for the individuals using the learning systems, but also requires companies and organizations massive human efforts to verify the correctness of the released information. This paper asks whether it is necessary to require \emph{all} input features for a model to return accurate predictions at test time and shows that, under a personalized setting, each individual may need to release only a small subset of these features without impacting the final decisions. The paper also provides an efficient sequential algorithm that chooses which attributes should be provided by each individual. Evaluation over several learning tasks shows that individuals may be able to report as little as 10\% of their information to ensure the same level of accuracy of a model that uses the complete users' information.
翻译:许多在关键领域(如法律、银行、招聘和医疗决策)使用的学习模型,会利用潜在敏感的用户信息进行推理。此外,通常需要完整的特征集才能执行推理。这不仅对使用学习系统的个人构成严重的隐私风险,还要求企业和组织投入大量人力来验证已发布信息的正确性。本文探讨是否必须要求模型在测试时使用所有输入特征才能返回准确预测,并证明在个性化设置下,每个个体可能只需释放这些特征中的一小部分,而不会影响最终决策。本文还提供了一种高效的顺序算法,用于选择每个个体应提供哪些属性。在多个学习任务上的评估表明,个体可能只需报告其信息的10%,即可确保达到使用完整用户信息的模型相同的准确率。