Out-of-distribution (OOD) generalization is a critical ability for deep learning models in many real-world scenarios including healthcare and autonomous vehicles. Recently, different techniques have been proposed to improve OOD generalization. Among these methods, gradient-based regularizers have shown promising performance compared with other competitors. Despite this success, our understanding of the role of Hessian and gradient alignment in domain generalization is still limited. To address this shortcoming, we analyze the role of the classifier's head Hessian matrix and gradient in domain generalization using recent OOD theory of transferability. Theoretically, we show that spectral norm between the classifier's head Hessian matrices across domains is an upper bound of the transfer measure, a notion of distance between target and source domains. Furthermore, we analyze all the attributes that get aligned when we encourage similarity between Hessians and gradients. Our analysis explains the success of many regularizers like CORAL, IRM, V-REx, Fish, IGA, and Fishr as they regularize part of the classifier's head Hessian and/or gradient. Finally, we propose two simple yet effective methods to match the classifier's head Hessians and gradients in an efficient way, based on the Hessian Gradient Product (HGP) and Hutchinson's method (Hutchinson), and without directly calculating Hessians. We validate the OOD generalization ability of proposed methods in different scenarios, including transferability, severe correlation shift, label shift and diversity shift. Our results show that Hessian alignment methods achieve promising performance on various OOD benchmarks. The code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/HessianAlignment}.
翻译:分布外(OOD)泛化是深度学习模型在包括医疗和自动驾驶在内的许多实际场景中的关键能力。最近,研究人员提出了多种技术来提升OOD泛化性能。在这些方法中,基于梯度的正则化器表现出优于其他竞争方案的性能。尽管如此,我们对于海森矩阵和梯度对齐在域泛化中所起作用的认知仍然有限。为解决这一问题,我们利用近期关于可迁移性的OOD理论,分析了分类器头部海森矩阵和梯度在域泛化中的作用。理论上,我们证明了跨域分类器头部海森矩阵的谱范数是迁移度量(衡量目标域与源域距离的概念)的上界。此外,我们分析了当鼓励海森矩阵与梯度相似性时所有被对齐的属性。我们的分析解释了CORAL、IRM、V-REx、Fish、IGA和Fishr等众多正则化器的成功机制——这些方法实际上对分类器头部海森矩阵和/或梯度进行了部分正则化。最后,我们基于海森梯度积(HGP)和哈钦森方法(Hutchinson),提出了两种简单高效的方法来匹配分类器头部海森矩阵和梯度,无需直接计算海森矩阵。我们在不同场景(包括可迁移性、严重相关性偏移、标签偏移和多样性偏移)中验证了所提方法的OOD泛化能力。实验结果表明,海森对齐方法在多种OOD基准测试中均取得了优异性能。代码地址:\url{https://github.com/huawei-noah/Federated-Learning/tree/main/HessianAlignment}。