With the advancement of DNNs into safety-critical applications, testing approaches for such models have gained more attention. A current direction is the search for and identification of systematic weaknesses that put safety assumptions based on average performance values at risk. Such weaknesses can take on the form of (semantically coherent) subsets or areas in the input space where a DNN performs systematically worse than its expected average. However, it is non-trivial to attribute the reason for such observed low performances to the specific semantic features that describe the subset. For instance, inhomogeneities within the data w.r.t. other (non-considered) attributes might distort results. However, taking into account all (available) attributes and their interaction is often computationally highly expensive. Inspired by counterfactual explanations, we propose an effective and computationally cheap algorithm to validate the semantic attribution of existing subsets, i.e., to check whether the identified attribute is likely to have caused the degraded performance. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data, where we show for a semantic segmentation model that (i) performance differences among the different pedestrian assets exist, but (ii) only in some cases is the asset type itself the reason for this reduction in the performance.
翻译:随着深度神经网络(DNN)在安全关键型应用中的推进,针对此类模型的测试方法受到越来越多关注。当前研究方向之一是搜索并识别系统性弱点,这些弱点可能危及基于平均性能值的安全假设,其表现形式为输入空间中(语义连贯的)子集或区域,使得DNN的表现系统性低于预期平均值。然而,将观测到的低性能归因于描述该子集的特定语义特征并非易事。例如,数据中其他(未考虑)属性带来的异质性可能扭曲结果。但综合考虑所有(可用)属性及其交互作用通常计算成本极高。受反事实解释启发,我们提出了一种高效且计算成本低廉的算法,用于验证现有子集的语义归因,即检验被识别的属性是否可能导致了性能下降。我们以自动驾驶领域为例,利用高标注度的仿真数据展示了该方法:针对语义分割模型,实验表明(i)不同行人资产间存在性能差异,但(ii)仅在部分情况下,资产类型本身才是性能下降的直接原因。