As recommender systems become increasingly prevalent, the environmental impact and energy efficiency of training large-scale models have come under scrutiny. This paper investigates the potential for energy-efficient algorithm performance by optimizing dataset sizes through downsampling techniques in the context of Green Recommender Systems. We conducted experiments on the MovieLens 100K, 1M, 10M, and Amazon Toys and Games datasets, analyzing the performance of various recommender algorithms under different portions of dataset size. Our results indicate that while more training data generally leads to higher algorithm performance, certain algorithms, such as FunkSVD and BiasedMF, particularly with unbalanced and sparse datasets like Amazon Toys and Games, maintain high-quality recommendations with up to a 50% reduction in training data, achieving nDCG@10 scores within approximately 13% of full dataset performance. These findings suggest that strategic dataset reduction can decrease computational and environmental costs without substantially compromising recommendation quality. This study advances sustainable and green recommender systems by providing insights for reducing energy consumption while maintaining effectiveness.
翻译:随着推荐系统的日益普及,大规模模型训练的环境影响与能源效率问题受到广泛关注。本文在绿色推荐系统框架下,研究通过降采样技术优化数据集规模以实现节能算法性能的潜力。我们在MovieLens 100K、1M、10M以及Amazon Toys and Games数据集上开展实验,分析了多种推荐算法在不同数据集规模比例下的性能表现。实验结果表明:虽然更多训练数据通常能带来更高的算法性能,但某些算法(如FunkSVD和BiasedMF)在Amazon Toys and Games这类不平衡且稀疏的数据集上,即使训练数据减少高达50%,仍能保持高质量的推荐效果,其nDCG@10分数与完整数据集性能的差距仅约13%。这些发现表明,通过策略性缩减数据集规模,可在不显著降低推荐质量的前提下减少计算成本与环境影响。本研究通过为降低能耗同时保持系统效能提供新见解,推动了可持续绿色推荐系统的发展。