Researchers typically investigate neural network representations by examining activation outputs for one or more layers of a network. Here, we investigate the potential for ReLU activation patterns (encoded as bit vectors) to aid in understanding and interpreting the behavior of neural networks. We utilize Representational Dissimilarity Matrices (RDMs) to investigate the coherence of data within the embedding spaces of a deep neural network. From each layer of a network, we extract and utilize bit vectors to construct similarity scores between images. From these similarity scores, we build a similarity matrix for a collection of images drawn from 2 classes. We then apply Fiedler partitioning to the associated Laplacian matrix to separate the classes. Our results indicate, through bit vector representations, that the network continues to refine class detectability with the last ReLU layer achieving better than 95\% separation accuracy. Additionally, we demonstrate that bit vectors aid in adversarial image detection, again achieving over 95\% accuracy in separating adversarial and non-adversarial images using a simple classifier.
翻译:研究者通常通过分析神经网络一个或多个层的激活输出来探究其表征。本文探究了ReLU激活模式(编码为位向量)在理解和解释神经网络行为方面的潜力。我们利用表征相异性矩阵(RDM)研究深度神经网络嵌入空间中数据的一致性。从网络的每个层中,我们提取位向量并据此构建图像间的相似度分数。基于这些相似度分数,我们为来自两个类别的图像集合构建相似度矩阵,随后对关联的拉普拉斯矩阵应用Fiedler划分来分离类别。结果表明,通过位向量表征,网络持续优化类别可检测性,最后一个ReLU层实现了超过95%的分离准确率。此外,我们证明了位向量有助于对抗性图像检测,使用简单分类器即可达到超过95%的对抗性与非对抗性图像分离准确率。