Audio super-resolution (SR), also referred to as bandwidth extension (BWE), aims to reconstruct high-fidelity signals from low-resolution (LR) or band-limited (BL) observations, an inherently ill-posed task due to the ambiguity of missing high-frequency (HF) content. This survey provides a comprehensive overview of the field, with a particular focus on the paradigm shift from discriminative mapping to modern generative modeling. We first review early discriminative deep neural network (DNN) models, which formulate BWE/SR as a deterministic mapping problem and are prone to regression-to-the-mean effects and spectral over-smoothing. We then systematically review generative approaches, including autoregressive (AR) models, variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion and score-based models, flow-based methods, and Schrödinger bridges. Across these approaches, we examine key design aspects, including representation domain, architecture, conditioning mechanisms, and trade-offs among reconstruction fidelity, perceptual quality, robustness, and computational efficiency. Furthermore, we discuss emerging directions involving large language models (LLMs) and multimodal foundation models, and highlight open challenges in perceptual evaluation, phase modeling, and real-world generalization. By providing a structured taxonomy and unified perspective, this survey establishes a comprehensive foundation and offers a practical roadmap for advancing BWE/SR from deterministic point estimation toward distribution-aware generative modeling.
翻译:音频超分辨率(SR),亦称带宽扩展(BWE),旨在从低分辨率(LR)或带宽受限(BL)的观测中重建高保真信号。由于缺失高频(HF)内容固有的模糊性,该任务具有不适定性。本综述全面概述该领域,特别关注从判别式映射到现代生成式建模的范式转变。我们首先回顾早期判别式深度神经网络(DNN)模型,其将BWE/SR表述为确定性映射问题,且易出现回归至均值效应和频谱过度平滑。随后,我们系统梳理生成式方法,包括自回归(AR)模型、变分自编码器(VAE)、生成对抗网络(GAN)、扩散与基于分数的模型、基于流的方法以及薛定谔桥。针对这些方法,我们考察关键设计方面,包括表示域、架构、条件机制,以及重建保真度、感知质量、鲁棒性和计算效率之间的权衡。此外,我们讨论涉及大语言模型(LLM)和多模态基础模型的新兴方向,并强调感知评估、相位建模及真实世界泛化中的开放挑战。通过提供结构化分类法和统一视角,本综述为将BWE/SR从确定性点估计推进至分布感知式生成建模奠定了全面基础,并提供了实用路线图。