This paper introduces a new technique to measure the feature dependency of neural network models. The motivation is to better understand a model by querying whether it is using information from human-understandable features, e.g., anatomical shape, volume, or image texture. Our method is based on the principle that if a model is dependent on a feature, then removal of that feature should significantly harm its performance. A targeted feature is "removed" by collapsing the dimension in the data distribution that corresponds to that feature. We perform this by moving data points along the feature dimension to a baseline feature value while staying on the data manifold, as estimated by a deep generative model. Then we observe how the model's performance changes on the modified test data set, with the target feature dimension removed. We test our method on deep neural network models trained on synthetic image data with known ground truth, an Alzheimer's disease prediction task using MRI and hippocampus segmentations from the OASIS-3 dataset, and a cell nuclei classification task using the Lizard dataset.
翻译:本文提出了一种测量神经网络模型特征依赖性的新方法。其动机是通过探究模型是否依赖人类可理解的特征(如解剖形状、体积或图像纹理)来更深入地理解模型行为。我们的方法基于以下原理:若模型的性能依赖于某特征,则移除该特征应显著损害其表现。通过将数据分布中对应目标特征的维度进行"坍塌"操作来实现特征移除——在数据流形上(由深度生成模型估计)沿特征维度将数据点移动至基准特征值。随后观察移除目标特征维度后,模型在修改后测试数据集上的性能变化。我们在合成图像数据(已知真实标签)训练的深度神经网络模型、基于OASIS-3数据集的MRI海马体分割阿尔茨海默病预测任务以及Lizard数据集细胞核分类任务上验证了该方法。