Three-dimensional convolutional neural networks (3D CNNs) have demonstrated remarkable performance in video recognition tasks by processing both spatial and temporal features. However, the cubic scaling of computational complexity poses significant time and energy efficiency challenges for conventional silicon-based hardware. To address this, we propose a hybrid optoelectronic architecture that delegates the computationally intensive 3D convolutional layer to an opto-atomic Spatio-temporal Holographic Correlator (STHC). This system stores temporal information as atomic coherence in an array of inhomogeneously broadened cold Rubidium-85 atoms and combines a traditional 2D spatial correlator to perform correlation in both space and time simultaneously. Our results on a four-class human action dataset demonstrate a classification accuracy of 59.72% using parallel large-scale kernels (30X40 pixels spatially, 8 frames temporally), with potential operating speeds projected up to 125,000 frames per second. This approach offers a pathway to massively accelerated video classification through a hybrid architecture.
翻译:三维卷积神经网络(3D CNN)通过同时处理空间和时间特征,在视频识别任务中展现出卓越性能。然而,其计算复杂度的立方级增长给传统硅基硬件带来了显著的时间与能量效率挑战。为此,我们提出一种混合光电架构,将计算密集型的3D卷积层委托给光-原子时空全息相关器(STHC)执行。该架构将时间信息存储为不均匀展宽冷铷-85原子阵列中的原子相干性,并结合传统二维空间相关器,同时实现空间与时间域的相关运算。基于四类人体动作数据集的实验结果显示,采用并行大规模核(空间尺寸30×40像素,时间维度8帧)的分类准确率达59.72%,潜在运行速度高达每秒125,000帧。该方法通过混合架构为大规模加速视频分类提供了可行途径。