We address in this work the question of identifying the failure conditions of a given image classifier. To do so, we exploit the capacity of producing controllable distributions of high quality image data made available by recent Generative Adversarial Networks (StyleGAN2): the failure conditions are expressed as directions of strong performance degradation in the generative model latent space. This strategy of analysis is used to discover corner cases that combine multiple sources of corruption, and to compare in more details the behavior of different classifiers. The directions of degradation can also be rendered visually by generating data for better interpretability. Some degradations such as image quality can affect all classes, whereas other ones such as shape are more class-specific. The approach is demonstrated on the MNIST dataset that has been completed by two sources of corruption: noise and blur, and shows a promising way to better understand and control the risks of exploiting Artificial Intelligence components for safety-critical applications.
翻译:本研究旨在解决如何识别给定图像分类器的失效条件问题。为此,我们利用近期生成对抗网络(StyleGAN2)所具备的生成可控高质量图像数据分布的能力:将失效条件表达为生成模型潜在空间中性能急剧下降的方向。该分析策略可用于发现融合多种干扰源的极端案例,并更细致地比较不同分类器的行为特性。性能下降方向还可通过生成数据实现可视化,以增强可解释性。某些退化因素(如图像质量)可能影响所有类别,而其他因素(如形状变化)则更具类别特异性。本文在MNIST数据集上验证了该方法,该数据集已引入噪声和模糊两种干扰源,结果表明该途径为深入理解并控制安全关键应用中人工智能组件的风险提供了前景广阔的研究方向。