In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition (i.e., less generalizable), so that one cannot prevent a model from co-adapting on such (so-called) "shortcut" signals: this makes the model fragile in various distribution shifts. To bypass such failure modes, we consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. This motivates us to extend the standard information bottleneck to additionally model the nuisance information. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. Our experimental results show that the proposed scheme improves robustness of learned representations (remarkably without using any domain-specific knowledge), with respect to multiple challenging reliability measures. For example, our model could advance the state-of-the-art on a recent challenging OBJECTS benchmark in novelty detection by $78.4\% \rightarrow 87.2\%$ in AUROC, while simultaneously enjoying improved corruption, background and (certified) adversarial robustness. Code is available at https://github.com/jh-jeong/nuisance_ib.
翻译:在训练数据有限的现实场景中,数据中的许多预测信号可能源自数据采集过程中的某些偏差(即泛化能力较弱),因此无法防止模型与这类(所谓的)“捷径”信号发生共适应:这使得模型在面对各种分布偏移时变得脆弱。为规避此类失效模式,我们考虑在互信息约束下建立对抗威胁模型,以涵盖训练中更广泛的扰动类别。这促使我们将标准信息瓶颈扩展至额外建模干扰信息。我们提出基于自编码器的训练方案以实现该目标,并针对卷积架构和Transformer架构设计了实用的编码器结构,以促进所提出的混合判别-生成训练。实验结果表明,所提方案(在无需使用任何领域特定知识的情况下)显著提升了所学表征在多个具有挑战性的可靠性度量上的鲁棒性。例如,在最新推出的具有挑战性的OBJECTS基准测试中,我们的模型将新奇检测的AUROC从78.4%提升至87.2%,同时还能提升对 corruption、背景及(经认证的)对抗扰动的鲁棒性。代码开源地址:https://github.com/jh-jeong/nuisance_ib。