The increasing use of Linux on commercial off-the-shelf (COTS) system-on-chip (SoC) in spaceborne computing inherits COTS susceptibility to radiation-induced failures like soft errors. Modern SoCs exacerbate this issue as aggressive transistor scaling reduces critical charge thresholds to induce soft errors and increases radiation effects within densely packed transistors, degrading overall reliability. Linux's monolithic architecture amplifies these risks, as tightly coupled kernel subsystems propagate errors to critical components (e.g., memory management), while limited error-correcting code (ECC) offers minimal mitigation. Furthermore, the lack of public soft error data from irradiation tests on COTS SoCs running Linux hinders reliability improvements. This study evaluates proton irradiation effects (20-50 MeV) on Linux across three COTS SoC architectures: Raspberry Pi Zero 2 W (40 nm CMOS, Cortex-A53), NXP i.MX 8M Plus (14 nm FinFET, Cortex-A53), and OrangeCrab (40 nm FPGA, RISC-V). Irradiation results show the 14 nm FinFET NXP SoC achieved 2-3x longer Linux uptime without ECC memory versus both 40 nm CMOS counterparts, partially due to FinFET's reduced charge collection. Additionally, this work presents the first cross-architecture analysis of soft error-prone Linux kernel components in modern SoCs to develop targeted mitigations. The findings establish foundational data on Linux's soft error sensitivity in COTS SoCs, guiding mission readiness for space applications.
翻译:在星载计算中,Linux在商用现货(COTS)片上系统(SoC)上的日益广泛应用,继承了COTS器件对辐射诱发故障(如软错误)的固有敏感性。现代SoC加剧了这一问题,因为激进的晶体管尺寸缩放降低了诱发软错误的临界电荷阈值,并在密集排布的晶体管内部增强了辐射效应,从而降低了整体可靠性。Linux的单体式架构放大了这些风险,其紧密耦合的内核子系统会将错误传播至关键组件(例如内存管理单元),而有限的纠错码(ECC)提供的缓解作用甚微。此外,缺乏在运行Linux的COTS SoC上进行辐照测试的公开软错误数据,阻碍了可靠性改进。本研究评估了质子辐照(20-50 MeV)对三种COTS SoC架构上Linux系统的影响:Raspberry Pi Zero 2 W(40 nm CMOS,Cortex-A53)、NXP i.MX 8M Plus(14 nm FinFET,Cortex-A53)和OrangeCrab(40 nm FPGA,RISC-V)。辐照结果表明,在无ECC内存的情况下,14 nm FinFET NXP SoC的Linux无故障运行时间比两款40 nm CMOS SoC长2-3倍,部分原因在于FinFET结构减少了电荷收集。此外,本研究首次对现代SoC中易发生软错误的Linux内核组件进行了跨架构分析,以开发针对性的缓解措施。这些发现为COTS SoC中Linux的软错误敏感性建立了基础数据,为空间应用的任务就绪性提供了指导。