The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
翻译:长期以来,深度神经网络中损失最小值的尖锐性与泛化能力之间的相关性一直备受讨论。尽管这一研究主要在计算机视觉领域的选定基准数据集上展开,本文则针对DCASE2020挑战数据中的音频场景分类任务探讨了这一方面。我们的分析基于二维滤波器标准化可视化及派生出的尖锐度度量。探索性分析表明,尖锐最小值相比平坦最小值往往展现出更好的泛化能力——对于来自先前未见设备的域外数据尤其如此——从而对平坦最小值具有更优泛化能力的观点提出了质疑。我们进一步发现,优化器的选择是影响最小值尖锐性的主要驱动因素,并讨论了由此产生的可比性限制。我们的代码、训练模型状态及损失景观可视化均已公开提供。