Image recognition tasks typically use deep learning and require enormous processing power, thus relying on hardware accelerators like GPUs and TPUs for fast, timely processing. Failure in real-time image recognition tasks can occur due to sub-optimal mapping on hardware accelerators during model deployment, which may lead to timing uncertainty and erroneous behavior. Mapping on hardware accelerators is done using multiple software components like deep learning frameworks, compilers, and device libraries, that we refer to as the computational environment. Owing to the increased use of image recognition tasks in safety-critical applications like autonomous driving and medical imaging, it is imperative to assess their robustness to changes in the computational environment, as the impact of parameters like deep learning frameworks, compiler optimizations, and hardware devices on model performance and correctness is not yet well understood. In this paper we present a differential testing framework, DeltaNN, that allows us to assess the impact of different computational environment parameters on the performance of image recognition models during deployment, post training. DeltaNN generates different implementations of a given image recognition model for variations in environment parameters, namely, deep learning frameworks, compiler optimizations and hardware devices and analyzes differences in model performance as a result. Using DeltaNN, we conduct an empirical study of robustness analysis of three popular image recognition models using the ImageNet dataset. We report the impact in terms of misclassifications and inference time differences across different settings. In total, we observed up to 100% output label differences across deep learning frameworks, and up to 81% unexpected performance degradation in terms of inference time, when applying compiler optimizations.
翻译:摘要:图像识别任务通常采用深度学习技术,需要强大的处理能力,因此依赖GPU和TPU等硬件加速器实现快速、及时的处理。在模型部署过程中,由于硬件加速器的映射优化不足,可能导致实时图像识别任务的时间不确定性和错误行为。硬件加速器的映射依赖于多个软件组件,如深度学习框架、编译器与设备库,我们将其统称为计算环境。随着图像识别在自动驾驶、医学成像等安全关键场景中的广泛应用,评估其对计算环境变化的稳健性至关重要——因为深度学习框架、编译器优化及硬件设备等参数对模型性能与正确性的影响仍尚未完全明晰。本文提出差分测试框架DeltaNN,用于评估训练后部署阶段不同计算环境参数对图像识别模型性能的影响。该框架为给定图像识别模型生成针对环境参数(即深度学习框架、编译器优化与硬件设备)变体的不同实现,并分析由此产生的模型性能差异。基于DeltaNN,我们使用ImageNet数据集对三种主流图像识别模型进行稳健性实证分析,从错误分类和推理时间差异两个维度报告影响。实验结果表明:在不同深度学习框架间,模型输出标签差异可达100%;应用编译器优化时,推理时间出现高达81%的意外性能退化。