The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the acoustic scene classification task of the DCASE2020 challenge data. Our analysis is based on two-dimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
翻译:深度神经网络中损失极小值的锐度与泛化能力之间的相关性长期以来一直是讨论的焦点。尽管该问题主要围绕计算机视觉领域的选定基准数据集进行研究,本文针对DCASE2020挑战数据中的声学场景分类任务探索了这一方面。我们的分析基于二维滤波器归一化可视化方法及由此推导出的锐度度量。探索性分析表明,锐利极小值往往比平坦极小值展现出更好的泛化能力——尤其在来自未见设备的域外数据上更为显著——从而对平坦极小值具有更优泛化能力的观点提出了质疑。我们进一步发现,优化器选择是影响极小值锐度的关键因素,并讨论了由此产生的可比性局限。本文公开了我们的代码、训练模型状态以及损失景观可视化结果。