The machine learning problem of extracting neural network parameters has been proposed for nearly three decades. Functionally equivalent extraction is a crucial goal for research on this problem. When the adversary has access to the raw output of neural networks, various attacks, including those presented at CRYPTO 2020 and EUROCRYPT 2024, have successfully achieved this goal. However, this goal is not achieved when neural networks operate under a hard-label setting where the raw output is inaccessible. In this paper, we propose the first attack that theoretically achieves functionally equivalent extraction under the hard-label setting, which applies to ReLU neural networks. The effectiveness of our attack is validated through practical experiments on a wide range of ReLU neural networks, including neural networks trained on two real benchmarking datasets (MNIST, CIFAR10) widely used in computer vision. For a neural network consisting of $10^5$ parameters, our attack only requires several hours on a single core.
翻译:神经网络参数提取这一机器学习问题已被提出近三十年。功能等价提取是该问题研究的关键目标。当攻击者能够访问神经网络的原始输出时,包括CRYPTO 2020和EUROCRYPT 2024上提出的多种攻击已成功实现此目标。然而,当神经网络在硬标签设置下运行(即无法访问原始输出)时,这一目标尚未实现。本文提出了首个在理论上实现硬标签设置下功能等价提取的攻击方法,该方法适用于ReLU神经网络。我们通过对多种ReLU神经网络的实际实验验证了攻击的有效性,包括在计算机视觉领域广泛使用的两个真实基准数据集(MNIST、CIFAR10)上训练的神经网络。对于包含$10^5$个参数的神经网络,我们的攻击仅需在单核处理器上运行数小时。