In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.
翻译:在音乐与声音处理领域,基频提取扮演着关键角色。本研究提出了一种专用于基频提取的卷积神经网络,尤其针对无伴奏演唱中的人声歌唱。值得注意的是,我们的方法将合成数据与自动标注的无伴奏演唱音频相结合,构建了鲁棒的训练环境。在包含合成声音、歌剧录音及时间拉伸元音的数据集评估中验证了其有效性。该工作为音乐与语音场景中增强的基频提取铺平了道路。