The substantial increase in AI model training has considerable environmental implications, mandating more energy-efficient and sustainable AI practices. On the one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models' performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice.
翻译:人工智能模型训练的显著增加对环境产生了重大影响,因此需要更加节能和可持续的人工智能实践。一方面,以数据为中心的方法在训练节能型AI模型方面展现出巨大潜力。另一方面,实例选择方法证明了能够使用最小化训练集训练AI模型,且性能下降可忽略不计。尽管这两个主题日益受到关注,但以数据为中心的训练集选择对能源效率的影响至今仍未被探讨。本文提出了一种基于进化的采样框架,旨在:(i) 为数据集和模型对定制识别精英训练样本,(ii) 比较模型性能和能效提升与常规模型训练实践的差异,(iii) 研究该框架在促进可持续模型训练实践方面的可行性。为评估所提框架,我们进行了包含8种常用AI分类模型和25个公开数据集的实证实验。结果表明,与常规训练实践相比,仅使用10%的精英训练样本,模型性能可提升50%,同时能源消耗显著节省达98%。