Neural networks have emerged as powerful tools across various applications, yet their decision-making process often remains opaque, leading to them being perceived as "black boxes." This opacity raises concerns about their interpretability and reliability, especially in safety-critical scenarios. Network inversion techniques offer a solution by allowing us to peek inside these black boxes, revealing the features and patterns learned by the networks behind their decision-making processes and thereby provide valuable insights into how neural networks arrive at their conclusions, making them more interpretable and trustworthy. This paper presents a simple yet effective approach to network inversion using a carefully conditioned generator that learns the data distribution in the input space of the trained neural network, enabling the reconstruction of inputs that would most likely lead to the desired outputs. To capture the diversity in the input space for a given output, instead of simply revealing the conditioning labels to the generator, we hideously encode the conditioning label information into vectors, further exemplified by heavy dropout in the generation process and minimisation of cosine similarity between the features corresponding to the generated images. The paper concludes with immediate applications of Network Inversion including in interpretability, explainability and generation of adversarial samples.
翻译:神经网络已成为各种应用中的强大工具,但其决策过程往往不透明,导致它们被视为“黑箱”。这种不透明性引发了对其可解释性和可靠性的担忧,尤其是在安全关键场景中。网络反演技术提供了一种解决方案,使我们能够窥视这些黑箱,揭示网络在决策过程中学习到的特征和模式,从而为理解神经网络如何得出结论提供宝贵见解,使其更具可解释性和可信度。本文提出了一种简单而有效的网络反演方法,使用一个经过精心条件化的生成器来学习训练神经网络输入空间中的数据分布,从而能够重建最可能产生所需输出的输入。为了捕捉给定输出在输入空间中的多样性,我们不是简单地将条件标签暴露给生成器,而是将条件标签信息隐式编码为向量,并通过生成过程中的重度丢弃操作以及最小化生成图像对应特征之间的余弦相似度来进一步体现。本文最后讨论了网络反演的直接应用,包括可解释性、可说明性以及对抗样本的生成。