Linux is increasingly deployed in Low Earth Orbit on commercial off the shelf systems on chip that were not designed for space radiation. Ionizing particles can trigger single event functional interrupts that crash the kernel without warning. Prior work mainly measured board level cross sections, leaving unclear which Linux subsystems fail and how a single upset propagates into an operating system wide failure across architectures, stress conditions, and irradiation conditions. We address this gap by subjecting three Linux platforms to proton irradiation in the 20 to 58 MeV range: a Raspberry Pi Zero 2W with a 40 nm planar ARM Cortex A53, an NXP i MX 8M Plus with a 14 nm FinFET ARM Cortex A53, and an OrangeCrab ECP5 FPGA hosting a VexRiscV RV32I soft core at 40 nm. Through kernel log forensics, we trace all 133 observed Linux failures, most of which have not been previously reported, to their originating kernel handlers. Failure profiles differ sharply across nodes. On the two 40 nm platforms, memory management and driver handlers account for 67 to 78% of events, while on the 14 nm SoC approximately 90% of failures funnel through a single eMMC storage path, comprising 56% filesystem failures and 34% driver failures. This shows that a SEFI susceptible peripheral can strongly dictate system reliability. The 14 nm SoC also shows roughly an order of magnitude lower Linux SEFI cross section, although irradiation geometry and DRAM exposure differences preclude isolating the contribution of process scaling. Reconstructed propagation chains show that faults can cascade through up to six kernel subsystems before terminal failure in severe events. Rather than motivating blanket redundancy, these results identify the kernel subsystem boundaries where radiation induced faults originate, enabling targeted mitigations for hardening COTS Linux systems for orbit.
翻译:摘要:Linux正越来越多地部署于低地球轨道上的商用现货片上系统,这些系统并非为空间辐射环境设计。电离粒子可能触发单粒子功能中断,导致内核在无预警情况下崩溃。先前研究主要测量板级截面,未能阐明哪些Linux子系统发生故障,以及单粒子翻转如何在架构、应力条件与辐照条件差异下传播为操作系统级故障。为填补这一空白,我们使三个Linux平台接受20至58 MeV质子辐照:采用40 nm平面ARM Cortex A53的Raspberry Pi Zero 2W、采用14 nm FinFET ARM Cortex A53的NXP i.MX 8M Plus,以及采用40 nm VexRiscV RV32I软核的OrangeCrab ECP5 FPGA。通过内核日志取证,我们将全部133次观测到的Linux故障(其中多数此前未被报道)追溯至其原始内核处理程序。不同节点的故障特征差异显著:在两个40 nm平台上,内存管理与驱动处理程序占总事件的67%至78%;而在14 nm SoC上,约90%的故障经由单一eMMC存储路径汇聚,包括56%文件系统故障与34%驱动故障。这表明易受SEFI影响的外设可强烈主导系统可靠性。尽管辐照几何构型与DRAM暴露差异阻碍了工艺缩放贡献的分离,14 nm SoC的Linux SEFI截面仍低约一个数量级。重建的传播链显示,在严重事件中故障可能级联通过多达六个内核子系统才导致终端失效。这些结果并非倡导全面冗余,而是定位辐射诱发故障起源的内核子系统边界,从而为加固轨道的商用现货Linux系统提供针对性缓解措施。