Convolutional neural networks have shown to be widely applicable to a large number of fields when large amounts of labelled data are available. The recent trend has been to use models with increasingly larger sets of tunable parameters to increase model accuracy, reduce model loss, or create more adversarially robust models -- goals that are often at odds with one another. In particular, recent theoretical work raises questions about the ability for even larger models to generalize to data outside of the controlled train and test sets. As such, we examine the role of the number of hidden layers in the ResNet model, demonstrated on the MNIST, CIFAR10, CIFAR100 datasets. We test a variety of parameters including the size of the model, the floating point precision, and the noise level of both the training data and the model output. To encapsulate the model's predictive power and computational cost, we provide a method that uses induced failures to model the probability of failure as a function of time and relate that to a novel metric that allows us to quickly determine whether or not the cost of training a model outweighs the cost of attacking it. Using this approach, we are able to approximate the expected failure rate using a small number of specially crafted samples rather than increasingly larger benchmark datasets. We demonstrate the efficacy of this technique on both the MNIST and CIFAR10 datasets using 8-, 16-, 32-, and 64-bit floating-point numbers, various data pre-processing techniques, and several attacks on five configurations of the ResNet model. Then, using empirical measurements, we examine the various trade-offs between cost, robustness, latency, and reliability to find that larger models do not significantly aid in adversarial robustness despite costing significantly more to train.
翻译:卷积神经网络在拥有大量标注数据的情况下已展现出广泛适用性。近期趋势倾向于采用具有更大可调参数集的模型,以提升模型精度、降低模型损失或构建更具对抗鲁棒性的模型——这些目标往往相互冲突。特别是,近期理论工作对更大模型能否泛化到受控训练集和测试集之外的数据提出了质疑。为此,我们以ResNet模型为对象,在MNIST、CIFAR10和CIFAR100数据集上考察了隐藏层数量的作用。我们测试了包括模型规模、浮点精度、训练数据噪声水平及模型输出噪声在内的多种参数。为综合评估模型的预测能力与计算成本,我们提出了一种方法:利用诱导故障将故障概率建模为时间的函数,并将其关联到一个新型度量指标,该指标可快速判定模型训练成本是否超过攻击成本。采用该方法,我们能够通过少量精心构建的样本而非日益庞大的基准数据集来近似预期故障率。我们在8位、16位、32位和64位浮点数条件下,结合多种数据预处理技术,针对ResNet模型的五种配置分别对MNIST和CIFAR10数据集验证了该技术的有效性。随后,基于经验测量结果,我们从成本、鲁棒性、延迟和可靠性等多个维度考察权衡关系,发现规模更大的模型虽训练成本显著增加,但并未显著提升对抗鲁棒性。