Negative sampling methods are vital in implicit recommendation models as they allow us to obtain negative instances from massive unlabeled data. Most existing approaches focus on sampling hard negative samples in various ways. These studies are orthogonal to the recommendation model and implicit datasets. However, such an idea contradicts the common belief in AutoML that the model and dataset should be matched. Empirical experiments suggest that the best-performing negative sampler depends on the implicit dataset and the specific recommendation model. Hence, we propose a hypothesis that the negative sampler should align with the capacity of the recommendation models as well as the statistics of the datasets to achieve optimal performance. A mismatch between these three would result in sub-optimal outcomes. An intuitive idea to address the mismatch problem is to exhaustively select the best-performing negative sampler given the model and dataset. However, such an approach is computationally expensive and time-consuming, leaving the problem unsolved. In this work, we propose the AutoSample framework that adaptively selects the best-performing negative sampler among candidates. Specifically, we propose a loss-to-instance approximation to transform the negative sampler search task into the learning task over a weighted sum, enabling end-to-end training of the model. We also designed an adaptive search algorithm to extensively and efficiently explore the search space. A specific initialization approach is also obtained to better utilize the obtained model parameters during the search stage, which is similar to curriculum learning and leads to better performance and less computation resource consumption. We evaluate the proposed framework on four benchmarks over three models. Extensive experiments demonstrate the effectiveness and efficiency of our proposed framework.
翻译:负采样方法在隐式推荐模型中至关重要,因其能从海量未标记数据中获取负样本。现有方法主要聚焦于通过多种方式采样困难负样本,这类研究与推荐模型及隐式数据集正交。然而,这种思路与AutoML中"模型与数据需匹配"的普遍认知相悖。实证研究表明,最优负采样器的选择取决于隐式数据集特性与具体推荐模型。为此,我们提出假说:负采样器需与推荐模型容量及数据集统计特征协同适配,三者失配将导致次优结果。直观解决失配问题的方法是针对给定模型与数据集穷举筛选最优负采样器,但该方案计算成本高昂且耗时。本文提出AutoSample框架,可自适应地从候选集中选择最优负采样器。具体而言,我们设计损失-样本近似方法,将负采样器搜索任务转化为加权和的学习任务,实现模型端到端训练;同时开发自适应搜索算法以充分高效探索搜索空间。此外,借鉴课程学习思想提出特定初始化策略,在搜索阶段更好利用已有模型参数,从而提升性能并降低计算资源消耗。我们在三个模型、四个基准数据集上评估了所提框架,大量实验证明了其有效性与高效性。