Addressing the so-called ``Red-AI'' trend of rising energy consumption by large-scale neural networks, this study investigates the actual energy consumption, as measured by node-level watt-meters, of training various fully connected neural network architectures. We introduce the BUTTER-E dataset, an augmentation to the BUTTER Empirical Deep Learning dataset, containing energy consumption and performance data from 63,527 individual experimental runs spanning 30,582 distinct configurations: 13 datasets, 20 sizes (number of trainable parameters), 8 network ``shapes'', and 14 depths on both CPU and GPU hardware collected using node-level watt-meters. This dataset reveals the complex relationship between dataset size, network structure, and energy use, and highlights the impact of cache effects. We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy. Our analysis also uncovers a surprising, hardware-mediated non-linear relationship between energy efficiency and network design, challenging the assumption that reducing the number of parameters or FLOPs is the best way to achieve greater energy efficiency. Highlighting the need for cache-considerate algorithm development, we suggest a combined approach to energy efficient network, algorithm, and hardware design. This work contributes to the fields of sustainable computing and Green AI, offering practical guidance for creating more energy-efficient neural networks and promoting sustainable AI.
翻译:针对大规模神经网络能耗持续攀升的“红色人工智能”(Red-AI)趋势,本研究通过节点级功率计实测数据,探究了多种全连接神经网络架构在训练过程中的实际能耗。我们提出了BUTTER-E数据集,作为BUTTER实证深度学习数据集的扩展版,包含63,527次独立实验的运行能耗与性能数据,覆盖30,582种不同配置:13个数据集、20种规模(可训练参数数量)、8种网络“形状”及14种网络深度,数据均通过节点级功率计在CPU和GPU硬件平台上采集。该数据集揭示了数据集规模、网络结构与能耗之间的复杂关系,并强调了缓存效应的影响。我们提出了一种简洁且有效的能耗模型,综合考虑了网络规模、计算开销与内存层次结构。分析还发现了一种出人意料的、由硬件介导的能耗效率与网络设计之间的非线性关系,挑战了“减少参数数量或FLOPs即可实现最优能效”的传统假设。为突显缓存感知型算法开发的必要性,我们提出了一种结合能效网络、算法与硬件设计的综合方案。本研究为可持续计算与绿色人工智能领域提供了实践指导,助力构建更节能的神经网络并推动可持续人工智能发展。