Federated Naive Bayes with Real Mixture of Gaussians and Institutional Governance Regularization for Network Intrusion Detection

Federated learning for intrusion detection rests on a flawed premise: that every participating institution contributes equally to the shared model. In practice, a financial institution with mature security controls and low vulnerability exposure produces fundamentally different data than a government agency running with weaker controls and higher exposure. Treating their local models as equivalent discards information that organisations already collect through standard risk management audits. Four governance indicators from the CRISC framework of ISACA, specifically control maturity (CMM), proportion of implemented controls (KCI), risk indicator activation frequency (KRI), and mean vulnerability score (CVSS), are combined here into an Institutional Coherence Index (ICC). This index enters a Nelder-Mead federated weight optimizer as a regularization prior, guiding weight assignment toward institutional quality without imposing any fixed allocation. Each node trains a hybrid local classifier combining Categorical and Gaussian Naive Bayes. The server combines local distributions as a real Mixture of Gaussians, preserving each node's statistical identity rather than collapsing it into a global parameter vector. Validation on NSL-KDD (2009), CIC-IDS2017 (2017), and UNSW-NB15 (2015), under seven Dirichlet heterogeneity levels, shows that the ICC-regularized proposal outperforms size-proportional federated averaging in all three datasets: F1-macro 0.9135 vs. 0.9076 (+0.0059), 0.7556 vs. 0.6771 (+0.0785), and 0.2110 vs. 0.2060 (+0.0050). Statistical significance holds in 70 of 94 configurations (McNemar, p < 0.05). In all three cases, the optimizer assigned the highest weight to the institutionally most mature node and the lowest to the least mature, without any explicit ordering constraint.

翻译：联邦学习用于入侵检测的前提存在缺陷：即假设每个参与机构对共享模型的贡献均等。实践中，具有成熟安全控制措施及低漏洞暴露风险的金融机构，与运行较弱控制措施且高暴露风险的政府机构产生的数据本质不同。若将两者的本地模型视为等同，则会丢弃各组织通过标准风险管理审计已收集的信息。本文整合了ISACA的CRISC框架中的四项治理指标，即控制成熟度（CMM）、已实施控制比例（KCI）、风险指标激活频率（KRI）和平均漏洞评分（CVSS），将其融合为机构一致性指数（ICC）。该指数作为正则化先验输入Nelder-Mead联邦权重优化器，引导权重分配向机构质量倾斜，同时避免施加固定分配。每个节点训练一个结合了分类朴素贝叶斯与高斯朴素贝叶斯的混合本地分类器。服务器将本地分布组合为真实高斯混合模型，保留每个节点的统计特性，而非将其压缩至全局参数向量中。在NSL-KDD（2009）、CIC-IDS2017（2017）和UNSW-NB15（2015）三个数据集上，针对七种狄利克雷异质性水平的验证表明，所提出的ICC正则化方法在所有三个数据集中均优于基于规模比例的联邦平均方法：F1-宏平均分别为0.9135 vs 0.9076（+0.0059）、0.7556 vs 0.6771（+0.0785）和0.2110 vs 0.2060（+0.0050）。在94种配置中有70种具有统计显著性（McNemar检验，p<0.05）。三个案例中，优化器将最高权重分配给机构成熟度最高的节点，最低权重分配给成熟度最低的节点，且无需任何显式排序约束。