Towards Automated Negative Sampling in Implicit Recommendation

Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset should be matched. Empirical experiments suggest that the best-performing negative sampler depends on the implicit dataset and the specific recommendation model. Hence, we propose a hypothesis that the negative sampler should align with the capacity of the recommendation models as well as the statistics of the datasets to achieve optimal performance. A mismatch between these three would result in sub-optimal outcomes. An intuitive idea to address the mismatch problem is to exhaustively select the best-performing negative sampler given the model and dataset. However, such an approach is computationally expensive and time-consuming, leaving the problem unsolved. In this work, we propose the AutoSample framework that adaptively selects the best-performing negative sampler among candidates. Specifically, we propose a loss-to-instance approximation to transform the negative sampler search task into the learning task over a weighted sum, enabling end-to-end training of the model. We also designed an adaptive search algorithm to extensively and efficiently explore the search space. A specific initialization approach is also obtained to better utilize the obtained model parameters during the search stage, which is similar to curriculum learning and leads to better performance and less computation resource consumption. We evaluate the proposed framework on four benchmarks over three models. Extensive experiments demonstrate the effectiveness and efficiency of our proposed framework.

翻译：负采样方法在隐式推荐模型中至关重要，因其能从海量未标记数据中获取负样本。现有方法主要聚焦于通过多种方式采样困难负样本，这类研究与推荐模型及隐式数据集正交。然而，这种思路与AutoML中"模型与数据需匹配"的普遍认知相悖。实证研究表明，最优负采样器的选择取决于隐式数据集特性与具体推荐模型。为此，我们提出假说：负采样器需与推荐模型容量及数据集统计特征协同适配，三者失配将导致次优结果。直观解决失配问题的方法是针对给定模型与数据集穷举筛选最优负采样器，但该方案计算成本高昂且耗时。本文提出AutoSample框架，可自适应地从候选集中选择最优负采样器。具体而言，我们设计损失-样本近似方法，将负采样器搜索任务转化为加权和的学习任务，实现模型端到端训练；同时开发自适应搜索算法以充分高效探索搜索空间。此外，借鉴课程学习思想提出特定初始化策略，在搜索阶段更好利用已有模型参数，从而提升性能并降低计算资源消耗。我们在三个模型、四个基准数据集上评估了所提框架，大量实验证明了其有效性与高效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日