This paper presents a novel approach to detect F0 through Convolutional Neural Networks and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates a very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our new approach and other state-of-the-art CNN methods reveals that our approach can enhance the detection rate by approximately 5% across various Signal-to-Noise Ratio conditions.
翻译:本文提出了一种通过卷积神经网络与图像处理技术直接从频谱图图像估计音高的新颖方法。我们的新方法展现出优异的检测精度:总计92%的预测基频轨迹与真实基频轨迹呈现强相关或中等相关。此外,通过将我们的新方法与当前最先进的CNN方法进行实验对比,结果表明在不同信噪比条件下,我们的方法可将检测率提升约5%。