In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100 Gb/s networking) plus custom hardware and an innovative system middleware software. D.A.V.I.D.E. features (i) a dedicated power monitor interface, built around the BeagleBone Black Board that allows high frequency sampling directly from the power backplane and scalable integration with the internal node telemetry and system level power management software; (ii) a custom-built chassis, based on OpenRack form factor, and liquid cooling that allows the system to be used in modern, energy efficient, datacenter; (iii) software components designed for enabling fine grain power monitoring, power management (i.e. power capping and energy aware job scheduling) and application power profiling, based on dedicated machine learning components. Software APIs are offered to developers and users to tune the computing node performance and power consumption around on the application requirements. The first pilot system that we will deploy at the beginning of 2017, will demonstrate key HPC applications from different fields ported and optimized for this innovative platform.
翻译:本文介绍了D.A.V.I.D.E.(欧洲增值基础设施开发项目),这是一款由E4计算机工程公司为PRACE(欧洲高级计算合作伙伴)设计的创新型节能高性能计算集群。D.A.V.I.D.E.采用业界领先组件(IBM POWER8-NVLink CPU、NVIDIA TESLA P100 GPU、Mellanox InfiniBand EDR 100Gb/s网络)以及定制硬件和创新的系统中间件软件构建。其主要特点包括:(i) 基于BeagleBone Black板卡构建的专用功率监控接口,可从电源背板实现高频采样,并与内部节点遥测及系统级电源管理软件实现可扩展集成;(ii) 采用OpenRack规格的定制机箱,配备液冷系统,支持在现代化节能数据中心运行;(iii) 基于专用机器学习组件设计的软件模块,可实现细粒度功率监控、电源管理(如功率封顶和能量感知作业调度)以及应用功耗特征分析。系统为开发者和用户提供软件API,使其能够根据应用需求调整计算节点的性能与功耗。计划于2017年初部署的首个原型系统将展示来自不同领域的核心HPC应用在该创新平台上完成移植与优化的成果。