Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning frameworks use pseudorandom number generators as the source of randomness. However, variations in design choices and implementations across different frameworks, software dependencies, and hardware backends along with the lack of statistical validation can lead to previously unexplored attack vectors on machine learning systems. Such attacks on randomness sources can be extremely covert, and have a history of exploitation in real-world systems. In this work, we examine the role of randomness in the machine learning development pipeline from an adversarial point of view, and analyze the implementations of PRNGs in major machine learning frameworks. We present RNGGuard to help machine learning engineers secure their systems with low effort. RNGGuard statically analyzes a target library's source code and identifies instances of random functions and modules that use them. At runtime, RNGGuard enforces secure execution of random functions by replacing insecure function calls with RNGGuard's implementations that meet security specifications. Our evaluations show that RNGGuard presents a practical approach to close existing gaps in securing randomness sources in machine learning systems.
翻译:机器学习将随机性作为数据采样、数据增强、权重初始化和优化等多个环节的基础组件。多数机器学习框架采用伪随机数生成器作为随机性来源。然而,不同框架、软件依赖项和硬件后端在设计选择与实现方式上的差异,加之统计验证的缺失,可能导致机器学习系统中出现先前未被探索的攻击向量。此类针对随机源的攻击具有极强的隐蔽性,且在现实系统中已有被利用的历史。本研究从对抗性视角审视随机性在机器学习开发流程中的作用,并分析了主流机器学习框架中伪随机数生成器的实现机制。我们提出RNGGuard以低代价帮助机器学习工程师加固系统安全。RNGGuard通过静态分析目标库的源代码,识别随机函数实例及其调用模块;在运行时则通过将不安全函数调用替换为符合安全规范的RNGGuard实现,来确保随机函数的执行安全。评估结果表明,RNGGuard为弥补机器学习系统随机源安全漏洞提供了一种切实可行的解决方案。