One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

from arxiv, This work has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning frameworks use pseudorandom number generators as the source of randomness. However, variations in design choices and implementations across different frameworks, software dependencies, and hardware backends along with the lack of statistical validation can lead to previously unexplored attack vectors on machine learning systems. Such attacks on randomness sources can be extremely covert, and have a history of exploitation in real-world systems. In this work, we examine the role of randomness in the machine learning development pipeline from an adversarial point of view, and analyze the implementations of PRNGs in major machine learning frameworks. We present RNGGuard to help machine learning engineers secure their systems with low effort. RNGGuard statically analyzes a target library's source code and identifies instances of random functions and modules that use them. At runtime, RNGGuard enforces secure execution of random functions by replacing insecure function calls with RNGGuard's implementations that meet security specifications. Our evaluations show that RNGGuard presents a practical approach to close existing gaps in securing randomness sources in machine learning systems.

翻译：机器学习将随机性作为数据采样、数据增强、权重初始化和优化等多个环节的基础组件。多数机器学习框架采用伪随机数生成器作为随机性来源。然而，不同框架、软件依赖项和硬件后端在设计选择与实现方式上的差异，加之统计验证的缺失，可能导致机器学习系统中出现先前未被探索的攻击向量。此类针对随机源的攻击具有极强的隐蔽性，且在现实系统中已有被利用的历史。本研究从对抗性视角审视随机性在机器学习开发流程中的作用，并分析了主流机器学习框架中伪随机数生成器的实现机制。我们提出RNGGuard以低代价帮助机器学习工程师加固系统安全。RNGGuard通过静态分析目标库的源代码，识别随机函数实例及其调用模块；在运行时则通过将不安全函数调用替换为符合安全规范的RNGGuard实现，来确保随机函数的执行安全。评估结果表明，RNGGuard为弥补机器学习系统随机源安全漏洞提供了一种切实可行的解决方案。

相关内容

生成器

关注 2

生成器是一次生成一个值的特殊类型函数。可以将其视为可恢复函数。调用该函数将返回一个可用于生成连续 x 值的生成【Generator】，简单的说就是在函数的执行过程中，yield语句会把你需要的值返回给调用生成器的地方，然后退出函数，下一次调用生成器函数的时候又从上次中断的地方开始执行，而生成器内的所有变量参数都会被保存下来供下一次使用。

【干货书】机器学习—工程师和科学家的第一课，348页pdf

专知会员服务

101+阅读 · 2023年2月24日

长综述《用于随机控制和博弈的机器学习方法最新发展》2022最新76页长论文，加州大学、上海纽约大学等

专知会员服务

47+阅读 · 2022年9月29日

【干货书】机器学习-为工程师和科学家的专门课，275页pdf

专知会员服务

117+阅读 · 2021年8月15日

【硬核书】机器学习随机矩阵理论，472页pdf

专知会员服务

148+阅读 · 2021年8月12日