Jupyter Notebook Attacks Taxonomy: Ransomware, Data Exfiltration, and Security Misconfiguration

from arxiv, Accepted to the 11th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2024). Co-located with the International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing)

Open-science collaboration using Jupyter Notebooks may expose expensively trained AI models, high-performance computing resources, and training data to security vulnerabilities, such as unauthorized access, accidental deletion, or misuse. The ubiquitous deployments of Jupyter Notebooks (~11 million public notebooks on Github have transformed collaborative scientific computing by enabling reproducible research. Jupyter is the main HPC's science gateway interface between AI researchers and supercomputers at academic institutions, such as the National Center for Supercomputing Applications (NCSA), national labs, and the industry. An impactful attack targeting Jupyter could disrupt scientific missions and business operations. This paper describes the network-based attack taxonomy of Jupyter Notebooks, such as ransomware, data exfiltration, security misconfiguration, and resource abuse for cryptocurrency mining. The open nature of Jupyter (direct data access, arbitrary code execution in multiple programming languages kernels) and its vast attack interface (terminal, file browser, untrusted cells) also attract attacks attempting to misuse supercomputing resources and steal state-of-the-art research artifacts. Jupyter uses encrypted datagrams of rapidly evolving WebSocket protocols that challenge even the most state-of-the-art network observability tools, such as Zeek. We envisage even more sophisticated AI-driven attacks can be adapted to target Jupyter, where defenders have limited visibility. In addition, Jupyter's cryptographic design should be adapted to resist emerging quantum threats. On balance, this is the first paper to systematically describe the threat model against Jupyter Notebooks and lay out the design of auditing Jupyter to have better visibility against such attacks.

翻译：基于Jupyter Notebook的开放科学协作可能使经过昂贵训练的人工智能模型、高性能计算资源及训练数据面临安全漏洞风险，例如未经授权的访问、意外删除或滥用。Jupyter Notebook的广泛部署（GitHub上约1100万个公开笔记本）通过实现可重复研究，彻底改变了协作式科学计算。在学术机构（如国家超级计算应用中心）、国家实验室及工业界，Jupyter已成为高性能计算领域连接人工智能研究者与超级计算机的核心科学网关接口。针对Jupyter的重大攻击可能破坏科学任务与商业运营。本文系统阐述了Jupyter Notebook基于网络的攻击分类体系，包括勒索软件、数据窃取、安全配置错误以及加密货币挖矿等资源滥用行为。Jupyter的开放特性（直接数据访问、多编程语言内核中的任意代码执行）及其庞大的攻击界面（终端、文件浏览器、不可信单元）也吸引了试图滥用超级计算资源、窃取前沿研究成果的攻击行为。Jupyter采用快速演进的WebSocket协议加密数据报，这对Zeek等最先进的网络可观测性工具构成了挑战。我们预见更复杂的人工智能驱动攻击可能被改造以针对Jupyter，而防御者在此类攻击中的可视性有限。此外，Jupyter的加密设计需适应以抵御新兴的量子威胁。总体而言，本文首次系统描述了针对Jupyter Notebook的威胁模型，并提出了审计Jupyter的设计方案，以增强对此类攻击的可视化监测能力。