In the rapidly evolving field of artificial intelligence, convolutional neural networks are essential for tackling complex challenges such as machine vision and medical diagnosis. Recently, to address the challenges in processing speed and power consumption of conventional digital convolution operations, many optical components have been suggested to replace the digital convolution layer in the neural network, accelerating various machine vision tasks. Nonetheless, the analog nature of the optical convolution kernel has not been fully explored. Here, we develop a spatial frequency domain training method to create arbitrarily shaped analog convolution kernels using an optical metasurface as the convolution layer, with its receptive field largely surpassing digital convolution kernels. By employing spatial multiplexing, the multiple parallel convolution kernels with both positive and negative weights are generated under the incoherent illumination condition. We experimentally demonstrate a 98.59% classification accuracy on the MNIST dataset, with simulations showing 92.63% and 68.67% accuracy on the Fashion-MNIST and CIFAR-10 datasets with additional digital layers. This work underscores the unique advantage of analog optical convolution, offering a promising avenue to accelerate machine vision tasks, especially in edge devices.
翻译:在快速发展的人工智能领域,卷积神经网络对于解决机器视觉和医疗诊断等复杂挑战至关重要。近来,为应对传统数字卷积运算在处理速度和功耗方面的挑战,许多光学元件被提出以替代神经网络中的数字卷积层,从而加速各种机器视觉任务。然而,光学卷积核的模拟特性尚未得到充分探索。本文中,我们开发了一种空间频域训练方法,利用光学超表面作为卷积层来创建任意形状的模拟卷积核,其感受野大幅超越数字卷积核。通过采用空间复用技术,在非相干照明条件下生成了具有正负权重的多个并行卷积核。我们在MNIST数据集上实验验证了98.59%的分类准确率,并通过仿真表明,在附加数字层的情况下,在Fashion-MNIST和CIFAR-10数据集上分别实现了92.63%和68.67%的准确率。这项工作凸显了模拟光学卷积的独特优势,为加速机器视觉任务(尤其是在边缘设备中)提供了一条前景广阔的途径。