User attribute prediction is a crucial task in various industries. However, sharing user data across different organizations faces challenges due to privacy concerns and legal requirements regarding personally identifiable information. Regulations such as the General Data Protection Regulation (GDPR) in the European Union and the Personal Information Protection Law of the People's Republic of China impose restrictions on data sharing. To address the need for utilizing features from multiple clients while adhering to legal requirements, federated learning algorithms have been proposed. These algorithms aim to predict user attributes without directly sharing the data. However, existing approaches typically rely on matching users across companies, which can result in dishonest partners discovering user lists or the inability to utilize all available features. In this paper, we propose a novel algorithm for predicting user attributes without requiring user matching. Our approach involves training deep matrix factorization models on different clients and sharing only the item vectors. This allows us to predict user attributes without sharing the user vectors themselves. The algorithm is evaluated using the publicly available MovieLens dataset and demonstrate that it achieves similar performance to the FedAvg algorithm, reaching 96% of a single model's accuracy. The proposed algorithm is particularly well-suited for improving customer targeting and enhancing the overall customer experience. This paper presents a valuable contribution to the field of user attribute prediction by offering a novel algorithm that addresses some of the most pressing privacy concerns in this area.
翻译:摘要:用户属性预测是各行业中的一项关键任务。然而,由于隐私问题及相关法律对个人身份信息的要求,跨不同组织共享用户数据面临挑战。欧盟的《通用数据保护条例》和中国的《个人信息保护法》等法规对数据共享施加了限制。为了在遵守法律要求的同时利用多个客户端的特征,联邦学习算法应运而生。这些算法旨在不直接共享数据的情况下预测用户属性。然而,现有方法通常依赖于跨公司匹配用户,这可能导致不诚实的合作方发现用户列表,或无法利用所有可用特征。本文提出了一种无需用户匹配即可预测用户属性的新颖算法。我们的方法是在不同客户端上训练深度矩阵分解模型,仅共享物品向量。这使得我们能够在不共享用户向量的情况下预测用户属性。该算法使用公开的MovieLens数据集进行评估,结果表明其性能与FedAvg算法相当,达到单一模型准确率的96%。所提出的算法特别适用于改善客户定位并提升整体客户体验。本文通过提出一种解决该领域最紧迫隐私问题的新算法,为用户属性预测领域做出了宝贵贡献。