A recent trend in deep learning algorithms has been towards training large scale models, having high parameter count and trained on big dataset. However, robustness of such large scale models towards real-world settings is still a less-explored topic. In this work, we first benchmark the performance of these models under different perturbations and datasets thereby representing real-world shifts, and highlight their degrading performance under these shifts. We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks and can also lead them to forget some of the desired characterstics. Finally, we propose a simple and cost-effective method to solve this problem, inspired by knowledge transfer literature. It involves robustifying smaller models, at a lower computation cost, and then use them as teachers to tune a fraction of these large scale networks, reducing the overall computational overhead. We evaluate our proposed method under various vision perturbations including ImageNet-C,R,S,A datasets and also for transfer learning, zero-shot evaluation setups on different datasets. Benchmark results show that our method is able to induce robustness to these large scale models efficiently, requiring significantly lower time and also preserves the transfer learning, zero-shot properties of the original model which none of the existing methods are able to achieve.
翻译:近期深度学习算法的发展趋势是训练大规模模型,这类模型参数数量庞大且基于大数据集训练而成。然而,此类大规模模型在真实场景中的鲁棒性仍是一个较少被探索的课题。本研究首先基于不同扰动和数据集(代表真实世界的分布偏移)对这些模型的性能进行基准测试,揭示了它们在面对这些偏移时性能下降的现象。随后,我们讨论了现有基于全模型微调的鲁棒化方案可能并非可扩展的选择——鉴于网络规模极其庞大,这类方案不仅计算成本高昂,还可能使模型遗忘某些理想特征。最后,受知识迁移文献启发,我们提出了一种简单且经济高效的方法来解决该问题。该方法先以较低计算成本鲁棒化小型模型,再将其作为教师模型对大规模网络的部分参数进行微调,从而降低整体计算开销。我们在多种视觉扰动场景(包括ImageNet-C、R、S、A数据集)以及迁移学习、零样本评估设置上对提出的方法进行了评测。基准测试结果表明,我们的方法能够高效地为大规模模型赋予鲁棒性,显著缩短训练时间,同时保留原始模型的迁移学习与零样本特性——这是现有方法均无法实现的。