Lack of interpretability of deep convolutional neural networks (DCNN) is a well-known problem particularly in the medical domain as clinicians want trustworthy automated decisions. One way to improve trust is to demonstrate the localisation of feature representations with respect to expert labeled regions of interest. In this work, we investigate the localisation of features learned via two varied learning paradigms and demonstrate the superiority of one learning approach with respect to localisation. Our analysis on medical and natural datasets show that the traditional end-to-end (E2E) learning strategy has a limited ability to localise discriminative features across multiple network layers. We show that a layer-wise learning strategy, namely cascade learning (CL), results in more localised features. Considering localisation accuracy, we not only show that CL outperforms E2E but that it is a promising method of predicting regions. On the YOLO object detection framework, our best result shows that CL outperforms the E2E scheme by $2\%$ in mAP.
翻译:深度卷积神经网络(DCNN)缺乏可解释性是一个公认的问题,尤其在医学领域,因为临床医生需要可信的自动化决策。提升信任的一种途径是展示特征表示相对于专家标注感兴趣区域的定位能力。在本工作中,我们研究了通过两种不同学习范式习得的特征的定位性能,并证明了其中一种学习方法在定位方面的优越性。我们在医学和自然数据集上的分析表明,传统的端到端(E2E)学习策略在跨多个网络层定位判别特征方面的能力有限。我们进一步证明,分层学习策略——即级联学习(CL)——能够产生更具定位性的特征。从定位精度来看,我们不仅显示CL优于E2E,还表明它是一种有前景的区域预测方法。在YOLO目标检测框架下,我们的最佳结果显示,CL的平均精度均值(mAP)比E2E方案高出$2\%$。