Automatic Singing Assessment and Singing Information Processing have evolved over the past three decades to support singing pedagogy, performance analysis, and vocal training. While the first approach objectively evaluates a singer's performance through computational metrics ranging from real-time visual feedback and acoustical biofeedback to sophisticated pitch tracking and spectral analysis, the latter method compares a predictor vocal signal with a target reference to capture nuanced data embedded in the singing voice. Notable advancements include the development of interactive systems that have significantly improved real-time visual feedback, and the integration of machine learning and deep neural network architectures that enhance the precision of vocal signal processing. This survey critically examines the literature to map the historical evolution of these technologies, while identifying and discussing key gaps. The analysis reveals persistent challenges, such as the lack of standardized evaluation frameworks, difficulties in reliably separating vocal signals from various noise sources, and the underutilization of advanced digital signal processing and artificial intelligence methodologies for capturing artistic expressivity. By detailing these limitations and the corresponding technological advances, this review demonstrates how addressing these issues can bridge the gap between objective computational assessments and subjective human-like evaluations of singing performance, ultimately enhancing both the technical accuracy and pedagogical relevance of automated singing evaluation systems.
翻译:自动歌唱评估与歌唱信息处理在过去三十年间持续发展,为歌唱教学、表演分析和声乐训练提供技术支持。前者通过计算指标(涵盖实时视觉反馈、声学生物反馈乃至精细的音高追踪与频谱分析)对演唱者表现进行客观评估;后者通过对比预测声乐信号与目标参考信号,捕捉歌唱声音中蕴含的细微数据特征。该领域的重要进展包括:显著提升实时视觉反馈效果的交互式系统,以及通过机器学习和深度神经网络架构增强声乐信号处理精度的技术集成。本综述通过批判性文献分析,系统梳理了这些技术的历史演进脉络,同时识别并探讨了关键研究空白。分析揭示了若干持续存在的挑战:标准化评估框架的缺失、声乐信号与各类噪声源可靠分离的困难,以及先进数字信号处理与人工智能方法在捕捉艺术表现力方面的应用不足。通过详细阐述这些局限性与相应技术进步,本文论证了如何通过解决这些问题来弥合客观计算评估与主观拟人化歌唱评价之间的鸿沟,从而全面提升自动化歌唱评估系统的技术精确度与教学相关性。