Inferring the exact parameters of a neural network with only query access is an NP-Hard problem, with few practical existing algorithms. Solutions would have major implications for security, verification, interpretability, and understanding biological networks. The key challenges are the massive parameter space, and complex non-linear relationships between neurons. We resolve these challenges using two insights. First, we observe that almost all networks used in practice are produced by random initialization and first order optimization, an inductive bias that drastically reduces the practical parameter space. Second, we present a novel query generation algorithm that produces maximally informative samples, letting us untangle the non-linear relationships efficiently. We demonstrate reconstruction of a hidden network containing over 1.5 million parameters, and of one 7 layers deep, the largest and deepest reconstructions to date, with max parameter difference less than 0.0001, and illustrate robustness and scalability across a variety of architectures, datasets, and training procedures.
翻译:仅通过查询访问推断神经网络的精确参数是一个NP难问题,现有实用算法极少。该问题的解决方案将对安全性、验证、可解释性及理解生物网络产生重大影响。关键挑战在于巨大的参数空间以及神经元间复杂的非线性关系。我们通过两个关键见解解决了这些挑战。首先,我们观察到实践中使用的几乎所有网络都通过随机初始化和一阶优化产生,这种归纳偏置极大地缩减了实际参数空间。其次,我们提出了一种新颖的查询生成算法,能产生信息最大化的样本,使我们能高效解耦非线性关系。我们演示了包含超过150万个参数的隐藏网络的重构,以及一个7层深度网络的完整重构——这是迄今为止规模最大、深度最深的网络重构案例,最大参数误差小于0.0001,并在多种架构、数据集和训练流程中展示了方法的鲁棒性与可扩展性。