Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks.

翻译：现有的推荐系统基准数据集要么规模较小，要么用户反馈形式极为有限。基于此类数据集评估的推荐模型往往缺乏大规模实际应用场景的实用价值。本文介绍Tenrec——一个新颖且公开可用的推荐系统数据集，它记录了四种不同推荐场景下的多种用户反馈。具体而言，Tenrec具有以下五个特征：（1）大规模：包含约500万用户和1.4亿次交互；（2）不仅包含正向用户反馈，还包含真实负反馈（区别于单类推荐）；（3）在四种不同场景中存在重叠的用户和物品；（4）包含点击、点赞、分享、关注等多种形式的用户正向反馈；（5）除用户ID和物品ID外，还包含额外特征。我们通过在十个不同的推荐任务上运行若干经典基线模型来验证Tenrec。Tenrec有望成为大多数主流推荐任务的有用基准数据集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日