We consider speech enhancement for signals picked up in one noisy environment that must be rendered to a listener in another noisy environment. For both far-end noise reduction and near-end listening enhancement, it has been shown that excessive focus on noise suppression or intelligibility maximization may lead to excessive speech distortions and quality degradations in favorable noise conditions, where intelligibility is already at ceiling level. Recently [1,2] propose to remedy this with a minimum processing framework that either reduces noise or enhances listening a minimum amount given that a certain intelligibility criterion is still satisfied. Additionally, it has been shown that joint consideration of both environments improves speech enhancement performance. In this paper, we formulate a joint far- and near-end minimum processing framework, that improves intelligibility while limiting speech distortions in favorable noise conditions. We provide closed-form solutions to specific boundary scenarios and investigate performance for the general case using numerical optimization. We also show that concatenating existing minimum processing far- and near-end enhancement methods preserves the effects of the initial methods. Results show that the joint optimization can further improve performance compared to the concatenated approach.
翻译:本文研究了在一个嘈杂环境中拾取的语音信号,需要在另一个嘈杂环境中呈现给听者的语音增强问题。对于远端噪声抑制和近端听力增强,已有研究表明,在可懂度已达到饱和水平的有利噪声条件下,过度关注噪声抑制或可懂度最大化可能导致过度语音失真和质量下降。近期文献[1,2]提出采用最小处理框架来解决这一问题,即在满足特定可懂度准则的前提下,尽量减少噪声抑制或听力增强的处理量。此外,已有研究表明,联合考虑两个环境因素可提升语音增强性能。本文提出了一种联合远端和近端的最小处理框架,该框架在有利噪声条件下既能提高可懂度,又能限制语音失真。我们给出了特定边界场景的闭式解,并利用数值优化方法研究了通用情况下的性能。同时证明,串联现有的远端和近端最小处理增强方法可保留各初始方法的效果。结果表明,与串联方法相比,联合优化可进一步提升性能。