Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are "black box" models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.
翻译:卷积神经网络特别适用于图像分析任务,如图像分类、目标识别或图像分割。然而,与所有人工神经网络一样,它们是“黑盒”模型,可解释性较差。本研究关注图像情感识别这一具体下游任务,提出一个框架,将基于CAM的技术与语料库级别的目标检测相结合,以更好地理解特定模型(本研究中为EmoNet)依赖哪些图像线索为图像分配特定情感。我们证明该模型主要关注人类特征,同时也探讨了特定图像修改所产生的显著影响。