Securing HPC has a unique threat model. Untrusted, malicious code exploiting the concentrated computing power may exert an outsized impact on the shared, open-networked environment in HPC, unlike well-isolated VM tenants in public clouds. Therefore, preempting attacks targeting supercomputing systems before damage remains the top security priority. The main challenge is that noisy attack attempts and unreliable alerts often mask \emph{real attacks}, causing permanent damages such as system integrity violations and data breaches. This paper describes a security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA). The objective is to demonstrate attack \textit{preemption}, i.e., stopping system compromise and data breaches at petascale supercomputers. Deployment of our testbed at NCSA enables the following key contributions: 1) Insights from characterizing unique \textit{attack patterns} found in real security logs of more than 200 security incidents curated in the past two decades at NCSA. 2) Deployment of an attack visualization tool to illustrate the challenges of identifying real attacks in HPC environments and to support security operators in interactive attack analyses. 3) Demonstrate the utility of the testbed by running novel models, such as Factor-Graph-based models, to preempt a real-world ransomware family.
翻译:保障高性能计算(HPC)安全面临独特的威胁模型。与公有云中隔离良好的虚拟机租户不同,利用HPC集中计算资源的不可信恶意代码,可能对HPC共享、开放的网络环境产生超常影响。因此,在造成损害前预判针对超级计算系统的攻击,仍是首要安全任务。主要挑战在于,大量攻击尝试和不可靠警报常掩盖真实攻击,导致系统完整性破坏和数据泄露等永久性损害。本文描述了一个嵌入美国国家超级计算应用中心(NCSA)某超级计算机实时流量中的安全测试平台。其目标是演示攻击预判能力,即在千万亿次规模超级计算机上阻止系统被攻陷与数据泄露。在NCSA部署该测试平台实现了以下关键贡献:1)基于对NCSA过去二十年积累的200多起安全事件真实日志的分析,揭示了其中独特攻击模式的特征洞察。2)部署攻击可视化工具,以阐明在HPC环境中识别真实攻击的挑战,并支持安全操作员进行交互式攻击分析。3)通过运行新型模型(如基于因子图的模型)预判真实世界勒索软件家族,验证测试平台的有效性。