We propose a speech enhancement system for multitrack audio. The system will minimize auditory masking while allowing one to hear multiple simultaneous speakers. The system can be used in multiple communication scenarios e.g., teleconferencing, invoice gaming, and live streaming. The ITU-R BS.1387 Perceptual Evaluation of Audio Quality (PEAQ) model is used to evaluate the amount of masking in the audio signals. Different audio effects e.g., level balance, equalization, dynamic range compression, and spatialization are applied via an iterative Harmony searching algorithm that aims to minimize the masking. In the subjective listening test, the designed system can compete with mixes by professional sound engineers and outperforms mixes by existing auto-mixing systems.
翻译:我们提出了一种用于多轨音频的语音增强系统。该系统能最小化听觉掩蔽效应,同时允许听众同时听到多个说话者的声音。该系统可应用于多种通信场景,例如电话会议、发票游戏以及直播。采用ITU-R BS.1387感知音频质量评估(PEAQ)模型来评估音频信号中的掩蔽量。通过迭代式和谐搜索算法应用不同的音频效果(如电平平衡、均衡、动态范围压缩和空间化),旨在最小化掩蔽效应。在主观听力测试中,所设计的系统能够与专业音响工程师的混音效果相媲美,并优于现有自动混音系统的混音效果。