Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.
翻译:深度神经网络(DNNs)已引起广泛关注,其内部模型如今被视为宝贵的知识产权资产。通过访问DNN来提取其内部模型,在概念上类似于通过访问分组密码的预言机来提取密钥。因此,密码分析技术,特别是类差分攻击,近年来得到了积极探索。基于ReLU的DNN是目前最常用且部署最广泛的架构。早期研究(例如Crypto 2020, Eurocrypt 2024)假设攻击者能够访问通常不可见的精确输出逻辑值,而近期研究(例如Asiacrypt 2024, Eurocrypt 2025)则聚焦于硬标签设定,即攻击者仅能获取最终的分类结果(例如“狗”或“汽车”)。值得注意的是,Carlini等人(Eurocrypt 2025)证明了即使在这种受限设定下,模型提取在多项式时间内也是可行的。在本文中,我们首先指出,随着攻击目标网络深度的增加,其攻击所依赖的假设变得越来越不切实际。实际上,满足这些假设所需的查询次数相对于攻击深度呈指数增长,这意味着该攻击并非总是在多项式时间内运行。为了应对这一关键局限,我们提出了一种名为跨层提取的新型攻击方法。该方法并非直接提取特定神经元的秘密参数(例如权重和偏置)——这会导致指数级成本,而是利用跨层的神经元交互,从更深层中提取这些信息。该技术显著降低了查询复杂度,并缓解了现有模型提取方法的局限性。