Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.
翻译:近年来,深度可分离卷积神经网络(DS-CNN)的进展催生了许多新型架构,这些架构在可扩展性和准确度方面显著超越传统CNN。本文揭示了DS-CNN架构的另一显著特性:其各层训练后的深度可分离卷积核中呈现出清晰可辨且可解释的模式。通过对数百万个来自不同模型、尺寸各异的训练滤波器进行深入分析,我们采用基于自编码器的无监督聚类方法对这些滤波器进行分类。令人惊讶的是,这些模式收敛为几个主要聚类,每个聚类均类似于高斯差(DoG)函数及其一阶和二阶导数。值得注意的是,我们能够对当前最优的ConvNextV2和ConvNeXt模型中超过95%和90%的滤波器进行分类。这一发现不仅是技术层面的新奇现象,更印证了神经科学家长期提出的哺乳动物视觉系统基础模型。因此,我们的研究深化了对已训练DS-CNN涌现特性的理解,并为人工视觉处理系统与生物视觉处理系统之间架起桥梁。更广泛而言,本研究为未来更具可解释性和生物启发的神经网络设计开辟了新路径。