Deep neural networks are applied in more and more areas of everyday life. However, they still lack essential abilities, such as robustly dealing with spatially transformed input signals. Approaches to mitigate this severe robustness issue are limited to two pathways: Either models are implicitly regularised by increased sample variability (data augmentation) or explicitly constrained by hard-coded inductive biases. The limiting factor of the former is the size of the data space, which renders sufficient sample coverage intractable. The latter is limited by the engineering effort required to develop such inductive biases for every possible scenario. Instead, we take inspiration from human behaviour, where percepts are modified by mental or physical actions during inference. We propose a novel technique to emulate such an inference process for neural nets. This is achieved by traversing a sparsified inverse transformation tree during inference using parallel energy-based evaluations. Our proposed inference algorithm, called Inverse Transformation Search (ITS), is model-agnostic and equips the model with zero-shot pseudo-invariance to spatially transformed inputs. We evaluated our method on several benchmark datasets, including a synthesised ImageNet test set. ITS outperforms the utilised baselines on all zero-shot test scenarios.
翻译:深度神经网络在日常生活的应用领域日益广泛。然而,它们仍缺乏关键能力,例如对空间变换输入信号的鲁棒处理。缓解这一严重鲁棒性问题的途径目前仅限于两种:要么通过增加样本多样性(数据增强)对模型进行隐式正则化,要么通过硬编码的归纳偏置进行显式约束。前者的限制因素在于数据空间的规模,这使得充分的样本覆盖难以实现。后者则受限于为每种可能场景开发此类归纳偏置所需的工程投入。相反,我们从人类行为中获得启发——在推理过程中,感知可通过心理或物理动作进行修正。我们提出一种新技术来模拟神经网络的此类推理过程,其核心在于推理时通过并行基于能量的评估遍历稀疏化的逆变换树。我们提出的推理算法称为逆变换搜索(ITS),它具有模型无关性,并能为模型提供针对空间变换输入的零样本伪不变性。我们在多个基准数据集上评估了该方法,包括一个合成的ImageNet测试集。ITS在所有零样本测试场景中均优于所使用的基线方法。