Neural network model extraction has recently emerged as an important security concern, as adversaries attempt to recover a network's parameters via black-box queries. Carlini et al. proposed in CRYPTO'20 a model extraction approach, consisting of two steps: signature extraction and sign extraction. However, in practice this signature-extraction method is limited to very shallow networks only, and the proposed sign-extraction method is exponential in time. Recently, Canales-Martinez et al. (Eurocrypt'24) proposed a polynomial-time sign-extraction method, but it assumes the corresponding signatures have already been successfully extracted and can fail on so-called low-confidence neurons. In this work, we first revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.'s signature-extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues. Our approach permits the extraction of much deeper networks than previously possible. In addition, we propose new methods to improve numerical precision in signature extraction, and enhance the sign extraction part by combining two polynomial methods to avoid exponential exhaustive search in the case of low-confidence neurons. This leads to the very first end-to-end model extraction method that runs in polynomial time. We validate our attack through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth. For instance, our attack extracts consistently at least eight layers of neural networks trained on either the MNIST or CIFAR-10 datasets, while previous works could barely extract the first three layers of networks of similar width.
翻译:神经网络模型提取作为一项重要的安全问题近年来备受关注,攻击者试图通过黑盒查询恢复网络参数。Carlini等人在CRYPTO'20提出包含两个步骤的模型提取方法:特征提取与符号提取。然而实际应用中,该特征提取方法仅适用于极浅层网络,且所提符号提取方法具有指数级时间复杂度。近期Canales-Martinez等人(Eurocrypt'24)提出了多项式时间的符号提取方法,但其前提是相应特征已成功提取,且在所谓低置信度神经元上可能失效。本研究首先通过系统性地识别并首次针对性解决Carlini等人特征提取方法的关键局限,重新审视并改进了特征提取流程。这些局限包括深层网络中的秩亏现象与噪声传播问题。为克服这些挑战,我们针对每个已识别问题提出了高效算法解决方案。该方法使得提取比以往更深层网络成为可能。此外,我们提出了提升特征提取数值精度的新方法,并通过结合两种多项式方法优化符号提取环节,避免低置信度神经元情况下的指数级穷举搜索。由此诞生了首个多项式时间运行的端到端模型提取方法。我们在基于ReLU的神经网络上通过大量实验验证了攻击效果,展示了提取深度的显著提升。例如,在MNIST或CIFAR-10数据集训练的神经网络上,我们的攻击能稳定提取至少八层网络,而先前研究仅能勉强提取类似宽度网络的前三层。