Deep neural networks (DNNs) struggle to generalize to out-of-distribution domains that are different from those in training despite their impressive performance. In practical applications, it is important for DNNs to have both high standard accuracy and robustness against out-of-distribution domains. One technique that achieves both of these improvements is disentangled learning with mixture distribution via auxiliary batch normalization layers (ABNs). This technique treats clean and transformed samples as different domains, allowing a DNN to learn better features from mixed domains. However, if we distinguish the domains of the samples based on entropy, we find that some transformed samples are drawn from the same domain as clean samples, and these samples are not completely different domains. To generate samples drawn from a completely different domain than clean samples, we hypothesize that transforming clean high-entropy samples to further increase the entropy generates out-of-distribution samples that are much further away from the in-distribution domain. On the basis of the hypothesis, we propose high entropy propagation~(EntProp), which feeds high-entropy samples to the network that uses ABNs. We introduce two techniques, data augmentation and free adversarial training, that increase entropy and bring the sample further away from the in-distribution domain. These techniques do not require additional training costs. Our experimental results show that EntProp achieves higher standard accuracy and robustness with a lower training cost than the baseline methods. In particular, EntProp is highly effective at training on small datasets.
翻译:深度神经网络(DNNs)尽管表现出色,但在面对与训练分布不同的域外分布时,其泛化能力仍显不足。在实际应用中,DNNs既需要具备较高的标准精度,又需要对域外分布具有鲁棒性。一种能同时实现这两方面改进的技术是通过辅助批归一化层(ABNs)进行混合分布解耦学习。该技术将干净样本与变换样本视为不同域,使DNN能够从混合域中学习更优特征。然而,若基于熵值区分样本所属域,我们发现部分变换样本与干净样本实际来自同一域,这些样本并非完全不同的域。为生成与干净样本完全不同的域外样本,我们假设:对高熵的干净样本进行变换以进一步提升其熵值,能够生成远离原始分布域的域外样本。基于此假设,我们提出高熵传播(EntProp),该方法将高熵样本馈入使用ABNs的网络中。我们引入了两种技术——数据增强与自由对抗训练——以提升样本熵值并使其进一步远离原始分布域。这些技术无需额外的训练成本。实验结果表明,相较于基线方法,EntProp能以更低的训练成本实现更高的标准精度与鲁棒性。特别地,EntProp在小规模数据集训练中表现出显著优势。