Scalable, Energy-Efficient Optical-Neural Architecture for Multiplexed Deepfake Video Detection

The rapid proliferation of AI-generated visual media has created an urgent need for efficient, trustworthy deepfake detection systems. However, existing deep learning-based detection methods rely on computationally intensive and energy-demanding inference algorithms, limiting their scalability. Here, we present a hybrid digital-analog deepfake video detection framework that combines a lightweight digital front-end with a spatially multiplexed optical decoding back-end for massively parallel analog inference through a programmable spatial light modulator. By simultaneously processing 15 or more video streams within a single optical propagation pass, the system enables high-throughput and accurate video-level authenticity prediction at reduced computational cost compared with purely digital methods. We validated this hybrid deepfake video processor using different datasets spanning classical face-swapping, real-world deepfake recordings, and fully AI-generated videos. Using a spatially multiplexed experimental set-up operating in the visible spectrum, we achieved average deepfake detection accuracy, sensitivity and specificity of 97.79%, 99.86% and 95.72%, respectively, on the Celeb-DF video dataset with 15 videos tested in parallel in a single optical pass per inference. The multiplexed optical decoder also demonstrates resilience against various types of video degradation, noise, compression, experimental misalignments and black-box adversarial attacks. Our results show that integrating optical computation into AI inference enables simultaneous gains in throughput, energy efficiency, and adversarial robustness - three properties that are difficult to achieve together in purely digital systems.

翻译：人工智能生成视觉媒体的快速普及催生了对高效、可信赖的深度伪造检测系统的迫切需求。然而，现有基于深度学习的检测方法依赖高计算强度和能耗的推理算法，限制了其可扩展性。本文提出一种混合数模深度伪造视频检测框架，结合轻量级数字前端与空间复用光学解码后端，通过可编程空间光调制器实现大规模并行模拟推理。该架构在单次光学传播路径中同时处理15路及以上视频流，与纯数字方法相比，能以更低计算成本实现高吞吐量且精确的视频级真实性预测。我们使用涵盖经典换脸、真实世界深度伪造视频及完全AI生成视频的多类数据集验证了该混合深度伪造视频处理器。在可见光波段运行的空间复用实验装置中，基于Celeb-DF视频数据集进行单次推理内并行测试15个视频，平均深度伪造检测准确率、敏感度和特异度分别达到97.79%、99.86%和95.72%。该复用型光学解码器还展现出对多种视频退化、噪声、压缩、实验误差及黑盒对抗攻击的鲁棒性。研究结果表明，将光学计算融入AI推理可同时实现吞吐量、能源效率与对抗鲁棒性的协同提升——这三项指标在纯数字系统中难以兼得。