Traditional deep learning algorithms often fail to generalize when they are tested outside of the domain of the training data. The issue can be mitigated by using unlabeled data from the target domain at training time, but because data distributions can change dynamically in real-life applications once a learned model is deployed, it is critical to create networks robust to unknown and unforeseen domain shifts. In this paper we focus on one of the reasons behind the inability of neural networks to be so: deep networks focus only on the most obvious, potentially spurious, clues to make their predictions and are blind to useful but slightly less efficient or more complex patterns. This behaviour has been identified and several methods partially addressed the issue. To investigate their effectiveness and limits, we first design a publicly available MNIST-based benchmark to precisely measure the ability of an algorithm to find the ''hidden'' patterns. Then, we evaluate state-of-the-art algorithms through our benchmark and show that the issue is largely unsolved. Finally, we propose a partially reversed contrastive loss to encourage intra-class diversity and find less strongly correlated patterns, whose efficiency is demonstrated by our experiments.
翻译:传统深度学习算法在面对训练数据域之外的数据时往往难以泛化。通过在训练时使用来自目标域的无标签数据可以缓解该问题,但由于现实应用中部署学习模型后数据分布可能动态变化,构建对未知及不可预见域偏移具有鲁棒性的网络至关重要。本文聚焦神经网络欠缺泛化能力的原因之一:深层网络仅关注最显著、可能具有误导性的线索进行预测,而忽视有效但较不显著或更复杂的模式。已有研究识别了该行为并提出多种方法部分解决了该问题。为探究其有效性与局限性,我们首先设计了一个基于MNIST的公开基准测试,用以精确衡量算法发现"隐藏"模式的能力。随后,通过该基准测试评估现有最优算法,结果表明该问题尚未得到根本解决。最后,我们提出一种部分反向的对比损失函数,通过鼓励类内多样性来发现相关性较弱的模式,实验证明了其有效性。