In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While auto-tuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing Auto-Tuned vocals in music recordings has become essential for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in both accuracy and robustness compared to Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks.
翻译:在音乐制作与音频处理领域,自动音高修正技术(又称Auto-Tune)的实施显著改变了人声表演的格局。尽管自动调音技术使音乐人能够调整人声音高并达到预期的精确度,但其使用也引发了关于真实性与艺术完整性的争论。因此,在音乐录音中检测与分析自动调音人声对音乐学者、制作人和听众而言已变得至关重要。然而,据我们所知,此前尚无相关研究。本研究提出一种基于数据驱动的方法,利用三元组网络(triplet networks)检测自动调音歌曲,并构建了由原始音频片段与自动调音音频片段组成的配套数据集。实验结果表明,相较于广泛应用于反欺骗及其他音频取证领域的端到端模型Rawnet2,该方法在准确性与鲁棒性上均具有显著优势。