Domain generalization focuses on leveraging knowledge from multiple related domains with ample training data and labels to enhance inference on unseen in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we introduce a two-phase representation learning technique using multi-task learning. This approach aims to cultivate a latent space from features spanning multiple domains, encompassing both native and cross-domains, to amplify generalization to IN and OOD territories. Additionally, we attempt to disentangle the latent space by minimizing the mutual information between the prior and latent space, effectively de-correlating spurious feature correlations. Collectively, the joint optimization will facilitate domain-invariant feature learning. We assess the model's efficacy across multiple cybersecurity datasets, using standard classification metrics on both unseen IN and OOD sets, and juxtapose the results with contemporary domain generalization methods.
翻译:域泛化旨在利用来自多个相关领域的大量训练数据和标签知识,增强对未见过的分布内域和分布外域的推断能力。在本研究中,我们提出了一种基于多任务学习的两阶段表示学习技术。该方法旨在从涵盖多个领域的特征(包括原生域和跨域)中构建潜在空间,以增强对分布内域和分布外域的泛化能力。此外,我们尝试通过最小化先验与潜在空间之间的互信息来解耦潜在空间,从而有效消除伪特征相关性。整体来看,联合优化将促进域不变特征学习。我们使用多个网络安全数据集评估模型效能,通过标准分类指标在未见过的分布内域和分布外域上进行分析,并将结果与当代域泛化方法进行对比。