In 2015 the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw built a modern datacenter and installed three substantial HPC systems as part of a 168 M PLN (36 M Euro) OCEAN project. Some of the systems were ill-conceived, badly architected and for the five years of their life span have brought minimal ROI. This paper reports on a two-year intensive effort to reengineer two of these HPC systems into a hybrid, multi-cloud solution called A-CHOICeM (Akademicka CHmura Obliczeniowa ICM). The intention was to expand the user base of ICM typical HPC system from around 200 to 500 to about 100,000 potential general academic users from all institutes of higher learning in the Warsaw area. The main characteristics of this solution are integration of on-premises ICM Cloud with several public cloud providers, building solution tailored to particular groups of academic users, containerization, integration of special computational paradigms like AI and Quantum Computing. Full process of designing the solution, competitive dialogue with suppliers, and full final specifications for the solution are presented. Several roadblocks, pitfalls and difficulties encountered along the way, including the conservative attitude of "the old school" HPC admins, University bureaucracy, national funding policies and others are presented.
翻译:2015年,华沙大学跨学科数学与计算建模中心(ICM)建成了一座现代化数据中心,并作为耗资1.68亿波兰兹罗提(3600万欧元)的OCEAN项目的一部分,安装了三个大型高性能计算系统。其中一些系统设计不当、架构糟糕,在其为期五年的使用寿命中带来的投资回报率极低。本文报告了为期两年的密集工作,旨在将其中两个高性能计算系统重构为名为A-CHOICeM(Akademicka CHmura Obliczeniowa ICM)的混合多云解决方案。其意图是将ICM典型高性能计算系统的用户群从约200至500人扩展到华沙地区所有高等教育机构约10万名潜在普通学术用户。该方案的主要特点包括:整合本地ICM云与多家公有云提供商,构建针对特定学术用户群体的定制化解决方案,采用容器化技术,以及集成人工智能和量子计算等特殊计算范式。本文展示了方案设计的全过程、与供应商的竞争性对话以及方案的最终完整规范。同时,文中还介绍了过程中遇到的多重障碍、陷阱和困难,包括"老派"高性能计算管理员的保守态度、大学官僚体制、国家资助政策等问题。