Artificial neural networks have proven to be extremely useful models that have allowed for multiple recent breakthroughs in the field of Artificial Intelligence and many others. However, they are typically regarded as black boxes, given how difficult it is for humans to interpret how these models reach their results. In this work, we propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts, enabling the generation of hypothetical scenarios that could help understand and even debug the neural network model. Through empirical evaluation, in a synthetic dataset and in the ImageNet dataset, we test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
翻译:人工神经网络已被证明是极具实用价值的模型,推动了人工智能等多个领域近期的一系列突破。然而,由于人类难以解释这些模型如何得出其结果,它们通常被视为黑箱。在本文中,我们提出了一种方法,可修改人工神经网络对特定人类定义概念的感知,从而生成有助于理解甚至调试神经网络模型的假设场景。通过经验评估,我们在合成数据集和ImageNet数据集上对不同模型测试了该方法,评估了所执行的操纵是否被模型正确解读,并分析了模型对这些操纵的反应。