Convolution goes higher-order: a biologically inspired mechanism empowers image classification

We propose a novel approach to image classification inspired by complex nonlinear biological visual processing, whereby classical convolutional neural networks (CNNs) are equipped with learnable higher-order convolutions. Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions akin to those observed in early and advanced stages of biological visual processing. We evaluated this approach on synthetic datasets by measuring sensitivity to testing higher-order correlations and performance in standard benchmarks (MNIST, FashionMNIST, CIFAR10, CIFAR100 and Imagenette). Our architecture outperforms traditional CNN baselines, and achieves optimal performance with expansions up to 3rd/4th order, aligning remarkably well with the distribution of pixel intensities in natural images. Through systematic perturbation analysis, we validate this alignment by isolating the contributions of specific image statistics to model performance, demonstrating how different orders of convolution process distinct aspects of visual information. Furthermore, Representational Similarity Analysis reveals distinct geometries across network layers, indicating qualitatively different modes of visual information processing. Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models. It provides insights into visual information processing and lays the groundwork for neural networks that better capture complex visual patterns, particularly in resource-constrained scenarios.

翻译：我们提出了一种受复杂非线性生物视觉处理启发的图像分类新方法，该方法为经典卷积神经网络（CNN）配备了可学习的高阶卷积。我们的模型采用了类Volterra展开的卷积算子，捕捉类似于生物视觉处理早期和高级阶段中观察到的乘法交互作用。我们通过在合成数据集上测量对测试高阶相关性的敏感性以及在标准基准测试（MNIST、FashionMNIST、CIFAR10、CIFAR100和Imagenette）中的性能来评估该方法。我们的架构优于传统CNN基线，并在高达三阶/四阶的展开时达到最佳性能，这与自然图像中像素强度的分布高度吻合。通过系统性的扰动分析，我们通过分离特定图像统计量对模型性能的贡献来验证这种一致性，展示了不同阶数的卷积如何处理视觉信息的不同方面。此外，表征相似性分析揭示了网络各层间不同的几何结构，表明存在性质不同的视觉信息处理模式。我们的工作架起了神经科学与深度学习之间的桥梁，为开发更高效、受生物学启发的计算机视觉模型提供了一条路径。它增进了对视觉信息处理的理解，并为构建能更好捕捉复杂视觉模式（尤其在资源受限场景下）的神经网络奠定了基础。