Domain generalization focuses on leveraging knowledge from multiple related domains with ample training data and labels to enhance inference on unseen in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we introduce a two-phase representation learning technique using multi-task learning. This approach aims to cultivate a latent space from features spanning multiple domains, encompassing both native and cross-domains, to amplify generalization to IN and OOD territories. Additionally, we attempt to disentangle the latent space by minimizing the mutual information between the prior and latent space, effectively de-correlating spurious feature correlations. Collectively, the joint optimization will facilitate domain-invariant feature learning. We assess the model's efficacy across multiple cybersecurity datasets, using standard classification metrics on both unseen IN and OOD sets, and juxtapose the results with contemporary domain generalization methods.
翻译:域泛化致力于利用来自多个相关领域(具有充足训练数据和标签)的知识,以增强对未见过的分布内(IN)和分布外(OOD)领域的推理能力。在本研究中,我们提出一种基于多任务学习的双阶段表征学习技术。该方法旨在从涵盖多个领域(包括原生领域与跨领域)的特征中构建一个隐空间,从而扩大对分布内和分布外区域的泛化能力。此外,我们尝试通过最小化先验空间与隐空间之间的互信息来解耦隐空间,有效去除伪特征相关性。通过联合优化,我们将促进域不变特征学习。我们基于多个网络安全数据集评估模型效能,使用标准分类指标对未见的分布内与分布外集合进行测试,并将结果与当代域泛化方法进行对比分析。