IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training practically impossible without discarding semantic interpretability or introducing data integrity violations. No prior work has addressed both problems with a formally specified, reproducible methodology. This paper does. We introduce BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation), the first formally specified heterogeneous multi-dataset benchmark for IoT intrusion detection, unifying CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature, with genuine-equivalence-only feature mapping, explicit zero-filling, and per-dataset coverage from 15% to 93%. A leave-one-dataset-out (LODO) protocol makes the generalisation gap precisely measurable: all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47, and we establish the first community generalisation baseline at mean LODO F1 = 0.5577, a result that shifts the agenda from single-benchmark optimisation toward cross-environment generalisation. We propose TCH-Net, a multi-branch network fusing a three-path Temporal branch (residual convolutional-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), a provenance-conditioned Contextual branch, and a Statistical branch via Cross-Branch Gated Attention Fusion (CB-GAF) with learnable sigmoid gates for dynamic feature-wise mixing. Across five random seeds, TCH-Net achieves F1 = 0.8296 +/- 0.0028, AUC = 0.9380 +/- 0.0025, and MCC = 0.6972 +/- 0.0056, outperforming all twelve baselines (p < 0.05, Wilcoxon) and recording the highest LODO F1 overall. BRIDGE and the full pipeline are at https://github.com/Ammar-ss/TCH-Net.
翻译:物联网僵尸网络检测技术已有进展,但大多数已发表系统仅在单一数据集上验证,难以在不同环境中泛化。异构特征空间使得多数据集联合训练几乎不可能,除非牺牲语义可解释性或引入数据完整性违规。此前尚无研究采用形式化、可复现的方法同时解决这两个问题,本文则填补了这一空白。我们提出BRIDGE(物联网域泛化评估基准参考),这是首个形式化定义的异构多数据集物联网入侵检测基准,通过基于CICFlowMeter术语体系的46维语义规范词汇表,以“仅真等效特征映射”原则、显式零填充策略以及各数据集覆盖率(15%至93%),统一了CICIDS-2017、CIC-IoT-2023、Bot-IoT、Edge-IIoTset和N-BaIoT五个数据集。采用留一数据集评估(LODO)协议,使泛化差距可精确度量:所有五种评估架构的平均LODO F1值介于0.39至0.47之间,我们首次建立社区泛化基线(平均LODO F1=0.5577),该结果将研究重心从单一基准优化转向跨环境泛化。我们提出TCH-Net多分支网络,融合三路径时序分支(残差卷积-BiGRU、步长降采样BiGRU、前置层归一化Transformer)、来源条件化上下文分支及统计分支,通过跨分支门控注意力融合(CB-GAF)机制,利用可学习的Sigmoid门控实现动态特征级混合。在五个随机种子下,TCH-Net的F1=0.8296±0.0028,AUC=0.9380±0.0025,MCC=0.6972±0.0056,性能超越全部十二个基线模型(Wilcoxon检验p<0.05),并取得最高总体LODO F1值。BRIDGE及完整流程见https://github.com/Ammar-ss/TCH-Net。