Matrix factorization (MF) is a widely used collaborative filtering (CF) algorithm for recommendation systems (RSs), due to its high prediction accuracy, great flexibility and high efficiency in big data processing. However, with the dramatically increased number of users/items in current RSs, the computational complexity for training a MF model largely increases. Many existing works have accelerated MF, by either putting in additional computational resources or utilizing parallel systems, introducing a large cost. In this paper, we propose algorithmic methods to accelerate MF, without inducing any additional computational resources. In specific, we observe fine-grained structured sparsity in the decomposed feature matrices when considering a certain threshold. The fine-grained structured sparsity causes a large amount of unnecessary operations during both matrix multiplication and latent factor update, increasing the computational time of the MF training process. Based on the observation, we firstly propose to rearrange the feature matrices based on joint sparsity, which potentially makes a latent vector with a smaller index more dense than that with a larger index. The feature matrix rearrangement is given to limit the error caused by the later performed pruning process. We then propose to prune the insignificant latent factors by an early stopping process during both matrix multiplication and latent factor update. The pruning process is dynamically performed according to the sparsity of the latent factors for different users/items, to accelerate the process. The experiments show that our method can achieve 1.2-1.65 speedups, with up to 20.08% error increase, compared with the conventional MF training process. We also prove the proposed methods are applicable considering different hyperparameters including optimizer, optimization strategy and initialization method.
翻译:矩阵分解(MF)是一种广泛应用于推荐系统(RS)的协同过滤(CF)算法,因其预测精度高、灵活性强且能高效处理大数据而备受青睐。然而,随着当前推荐系统中用户/物品数量急剧增加,训练MF模型的计算复杂度也大幅上升。现有工作通过增加额外计算资源或利用并行系统来加速MF,但往往引入了高昂成本。本文提出无需额外计算资源的算法级加速方法。具体而言,我们观察到在设定特定阈值时,分解后的特征矩阵中存在细粒度的结构化稀疏性。这种结构化稀疏性会导致矩阵乘法和隐因子更新过程中产生大量不必要操作,从而增加MF训练的计算时间。基于这一观察,我们首先提出基于联合稀疏性对特征矩阵进行重排,使得索引较小的隐向量比索引较大的隐向量更稠密。该特征矩阵重排旨在限制后续剪枝过程带来的误差。随后,我们提出在矩阵乘法和隐因子更新过程中通过提前停止机制剪枝不重要的隐因子。剪枝过程根据不同用户/物品的隐因子稀疏性动态执行,以加速处理。实验表明,与传统MF训练过程相比,本文方法可实现1.2-1.65倍的速度提升,同时误差最多增加20.08%。我们还证明所提方法适用于包括优化器、优化策略和初始化方法在内的不同超参数设置。