Vision processing units and other commercial neural-network inference accelerators are increasingly deployed in safety-relevant edge applications, but their fault response under transient hardware disturbances remains poorly characterized in the open literature. For the Intel Movidius Myriad X, packaged as the Intel Neural Compute Stick 2 (NCS2), only a single feasibility study has been published. We report a systematic single-pulse electromagnetic fault injection (EMFI) campaign on the NCS2 running three ImageNet-trained convolutional neural networks (ResNet-18, ResNet-50, VGG-11) on the OpenVINO runtime. Across 1,536 spot-test trials at characterized hotspots and approximately 16,000 parameter-search trials, single pulses produce four reproducible outcome classes: no measured accuracy change, minor silent data corruption, major persistent degradation that survives across subsequent inferences until model reload, and device hangs requiring USB power-cycling; these outcomes are respectively interpreted as no-effect, SDC with possible SET-like or small persistent-state mechanisms, SEU-like persistent corruption, and SEFI-like loss of functionality. Two findings are central. First, the major-degradation class can be induced at 18-31% of trials at characterized hotspots, with post-collapse top-1 accuracy below five percent and persistence across all subsequent inferences until explicit model reload - a regime that no inference-API-level mechanism detects. Second, this regime is also inducible by pulses delivered to an idle device with the model already loaded, demonstrating that load-time integrity checks alone are insufficient. We discuss mitigation strategies graded by class, focusing on mechanisms implementable at the application level without modification to the device firmware or the OpenVINO runtime.
翻译:视觉处理单元及其他商用神经网络推理加速器越来越多地部署于安全关键型边缘应用中,但公开文献对其在瞬态硬件扰动下的故障响应仍缺乏系统性描述。针对封装为Intel神经计算棒2(NCS2)的Intel Movidius Myriad X,目前仅有一项可行性研究发表。我们报告了一项针对NCS2的系统性单脉冲电磁故障注入(EMFI)实验,其运行基于OpenVINO运行时并执行三个ImageNet训练的卷积神经网络(ResNet-18、ResNet-50、VGG-11)。在特征热点的1536次定点测试试验和约16000次参数搜索试验中,单脉冲产生了四种可复现的结果类别:无测量精度变化、轻微静默数据损坏、持续至后续推理直至模型重载的主要持久退化、以及需USB电源复位的设备挂起;这些结果分别被解释为无效应、具有单粒子瞬态类似机制或小规模持久态机制的静默数据损坏、单粒子翻转类持久性损坏、以及单粒子功能中断类功能丧失。两个核心发现如下:第一,在特征热点的18-31%试验中可诱发主要退化类别,其崩溃后Top-1精度低于5%,且在所有后续推理中持续存在直至显式模型重载——这是任何推理API层机制均无法检测的状态。第二,该状态也可通过向已加载模型但处于空闲状态的设备发送脉冲诱导,证明仅靠加载时完整性检查不足。我们讨论了按类别分级缓解策略,重点聚焦可在应用层实现且无需修改设备固件或OpenVINO运行时的机制。