Domain generalization focuses on leveraging knowledge from multiple related domains with ample training data and labels to enhance inference on unseen in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we introduce a two-phase representation learning technique using multi-task learning. This approach aims to cultivate a latent space from features spanning multiple domains, encompassing both native and cross-domains, to amplify generalization to IN and OOD territories. Additionally, we attempt to disentangle the latent space by minimizing the mutual information between the prior and latent space, effectively de-correlating spurious feature correlations. Collectively, the joint optimization will facilitate domain-invariant feature learning. We assess the model's efficacy across multiple cybersecurity datasets, using standard classification metrics on both unseen IN and OOD sets, and juxtapose the results with contemporary domain generalization methods.
翻译:域泛化旨在利用来自多个相关领域且拥有充足训练数据和标签的知识,以增强对未见过的分布内(IN)和分布外(OOD)领域的推理能力。在本研究中,我们提出了一种使用多任务学习的两阶段表示学习技术。该方法旨在从跨越多个领域(包括本地领域和跨领域)的特征中培养一个隐空间,从而增强对IN和OOD区域的泛化能力。此外,我们尝试通过最小化先验与隐空间之间的互信息来解耦隐空间,从而有效去除虚假特征的相关性。通过联合优化,将促进域不变特征学习。我们使用多个网络安全数据集评估该模型的有效性,采用标准分类指标对未见过的IN和OOD数据集进行测试,并将结果与当代域泛化方法进行对比。