Green Recommender Systems: Optimizing Dataset Size for Energy-Efficient Algorithm Performance

As recommender systems become increasingly prevalent, the environmental impact and energy efficiency of training large-scale models have come under scrutiny. This paper investigates the potential for energy-efficient algorithm performance by optimizing dataset sizes through downsampling techniques in the context of Green Recommender Systems. We conducted experiments on the MovieLens 100K, 1M, 10M, and Amazon Toys and Games datasets, analyzing the performance of various recommender algorithms under different portions of dataset size. Our results indicate that while more training data generally leads to higher algorithm performance, certain algorithms, such as FunkSVD and BiasedMF, particularly with unbalanced and sparse datasets like Amazon Toys and Games, maintain high-quality recommendations with up to a 50% reduction in training data, achieving nDCG@10 scores within approximately 13% of full dataset performance. These findings suggest that strategic dataset reduction can decrease computational and environmental costs without substantially compromising recommendation quality. This study advances sustainable and green recommender systems by providing insights for reducing energy consumption while maintaining effectiveness.

翻译：随着推荐系统的日益普及，大规模模型训练的环境影响与能源效率问题受到广泛关注。本文在绿色推荐系统框架下，研究通过降采样技术优化数据集规模以实现节能算法性能的潜力。我们在MovieLens 100K、1M、10M以及Amazon Toys and Games数据集上开展实验，分析了多种推荐算法在不同数据集规模比例下的性能表现。实验结果表明：虽然更多训练数据通常能带来更高的算法性能，但某些算法（如FunkSVD和BiasedMF）在Amazon Toys and Games这类不平衡且稀疏的数据集上，即使训练数据减少高达50%，仍能保持高质量的推荐效果，其nDCG@10分数与完整数据集性能的差距仅约13%。这些发现表明，通过策略性缩减数据集规模，可在不显著降低推荐质量的前提下减少计算成本与环境影响。本研究通过为降低能耗同时保持系统效能提供新见解，推动了可持续绿色推荐系统的发展。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日