Automated Data Denoising for Recommendation

In real-world scenarios, most platforms collect both large-scale, naturally noisy implicit feedback and small-scale yet highly relevant explicit feedback. Due to the issue of data sparsity, implicit feedback is often the default choice for training recommender systems (RS), however, such data could be very noisy due to the randomness and diversity of user behaviors. For instance, a large portion of clicks may not reflect true user preferences and many purchases may result in negative reviews or returns. Fortunately, by utilizing the strengths of both types of feedback to compensate for the weaknesses of the other, we can mitigate the above issue at almost no cost. In this work, we propose an Automated Data Denoising framework, \textbf{\textit{AutoDenoise}}, for recommendation, which uses a small number of explicit data as validation set to guide the recommender training. Inspired by the generalized definition of curriculum learning (CL), AutoDenoise learns to automatically and dynamically assign the most appropriate (discrete or continuous) weights to each implicit data sample along the training process under the guidance of the validation performance. Specifically, we use a delicately designed controller network to generate the weights, combine the weights with the loss of each input data to train the recommender system, and optimize the controller with reinforcement learning to maximize the expected accuracy of the trained RS on the noise-free validation set. Thorough experiments indicate that AutoDenoise is able to boost the performance of the state-of-the-art recommendation algorithms on several public benchmark datasets.

翻译：在实际场景中，大多数平台既收集大规模但存在自然噪声的隐式反馈数据，也收集规模较小但相关性极高的显式反馈数据。由于数据稀疏性问题，隐式反馈常被作为训练推荐系统（RS）的默认选择，但这类数据可能因用户行为的随机性和多样性而包含大量噪声。例如，大部分点击行为可能并不反映用户的真实偏好，许多购买行为最终可能导致差评或退货。幸运的是，通过利用两类反馈的优势互补，我们可以在几乎零成本的情况下缓解上述问题。本文提出面向推荐的自动化数据去噪框架\textbf{\textit{AutoDenoise}}，该框架利用少量显式数据作为验证集来指导推荐模型训练。受课程学习（CL）广义定义的启发，AutoDenoise能够在训练过程中根据验证性能的引导，自动动态地为每个隐式数据样本分配最合适的（离散或连续）权重。具体而言，我们设计精密的控制器网络生成权重，将权重与各输入数据的损失函数相结合以训练推荐系统，并通过强化学习优化控制器，使训练后的推荐系统在无噪声验证集上的预期准确率最大化。大量实验表明，AutoDenoise能够显著提升多个公开基准数据集上最先进推荐算法的性能。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日