BusyBox is one of the most widely reused userland components in Linux-based Internet-of-Things (IoT) firmware, yet its security assessment remains difficult because firmware images are frequently stripped, vendor patch practices are inconsistent, and the same source component is compiled for heterogeneous architectures. We propose EvoPatch-IoT, an evolution-aware cross-architecture retrieval framework for stripped BusyBox firmware binaries. EvoPatch-IoT combines anonymous instruction/context features, graph-level statistics, per-binary geometric priors, and historical function prototypes to localize homologous and potentially vulnerable functions without relying on symbols, source paths, or version strings at test time. We further construct a large-scale BusyBox benchmark from 57 historical versions, 270 unstripped binaries, 285 stripped binaries, and 130 source releases, yielding 1,550,752 function-symbol rows, 1,290,369 analysis-function rows, and 155,845 high-confidence stripped-to-unstripped matches. On 57 fully covered versions and 1,020 directed architecture pairs, EvoPatch-IoT achieves a weighted Hit@1 of 34.56\% and Hit@10 of 56.24\%, outperforming the strongest baseline by 16.04\% and 26.85\%, respectively, and reducing the expected manual inspection space by 98.98\%. The method is best on 56 of 57 versions and maintains consistent advantages on difficult architecture pairs. In addition, a version-change transfer study reaches a mean ROC-AUC of 0.9887, and a CVE-2021-42386 patch-state proxy obtains 82.44\% mean accuracy and 88.47\% mean F1 across held-out architectures. These results show that evolution-aware binary retrieval is a practical foundation for scalable IoT firmware vulnerability auditing.
翻译:BusyBox 是基于 Linux 的物联网固件中复用最广泛的用户态组件之一,但其安全评估仍面临困难,原因在于固件镜像常被剥离符号表、供应商补丁实践不一致,且同一源码组件需为异构架构编译。我们提出 EvoPatch-IoT,一种面向剥离符号的 BusyBox 固件二进制文件的演化感知跨架构检索框架。EvoPatch-IoT 结合匿名指令/上下文特征、图级统计量、每个二进制的几何先验以及历史函数原型,在测试时无需依赖符号、源码路径或版本字符串,即可定位同源且可能易受攻击的函数。我们进一步基于 57 个历史版本、270 个未剥离二进制文件、285 个剥离二进制文件和 130 个源码发布版本构建了大规模 BusyBox 基准测试集,得到 1,550,752 行函数-符号映射、1,290,369 行分析-函数映射以及 155,845 个高置信度的剥离到未剥离匹配对。在 57 个全覆盖版本和 1,020 个定向架构对上,EvoPatch-IoT 的加权 Hit@1 达 34.56%,Hit@10 达 56.24%,分别比最强基线高出 16.04% 和 26.85%,并将预期人工检查空间缩减 98.98%。该方法在 57 个版本中的 56 个上表现最优,且在困难架构对上保持了一致优势。此外,版本变更迁移研究达到了 0.9887 的平均 ROC-AUC,针对 CVE-2021-42386 的补丁状态代理在未见架构上实现了 82.44% 的平均准确率和 88.47% 的平均 F1 分数。这些结果表明,演化感知的二进制检索是物联网固件可扩展漏洞审计的实用基础。