Monitoring energy behaviors in AI data centers is crucial, both to reduce their energy consumption and to raise awareness among their users which are key actors in the AI field. This paper shows a proof of concept of easy and lightweight monitoring of energy behaviors at the scale of a whole data center, a user or a job submission. Our system uses software wattmeters and we validate our setup with per node accurate external wattmeters. Results show that there is an interesting potential from the efficiency point of view, providing arguments to create user engagement thanks to energy monitoring.
翻译:监测人工智能数据中心的能耗行为至关重要,这既能降低其能源消耗,又能提升人工智能领域关键参与者——用户群体的节能意识。本文展示了一种针对整个数据中心、单个用户或作业提交的便捷轻量级能耗行为监测的概念验证。我们的系统采用软件功率计,并通过节点级高精度外部功率计验证了系统配置。结果表明,从能效角度看该系统具有显著潜力,为通过能耗监测促进用户参与提供了有力依据。