We consider speech enhancement for signals picked up in one noisy environment that must be rendered to a listener in another noisy environment. For both far-end noise reduction and near-end listening enhancement, it has been shown that excessive focus on noise suppression or intelligibility maximization may lead to excessive speech distortions and quality degradations in favorable noise conditions, where intelligibility is already at ceiling level. Recently [1,2] propose to remedy this with a minimum processing framework that either reduces noise or enhances listening a minimum amount given that a certain intelligibility criterion is still satisfied Additionally, it has been shown that joint consideration of both environments improves speech enhancement performance. In this paper, we formulate a joint far- and near-end minimum processing framework, that improves intelligibility while limiting speech distortions in favorable noise conditions. We provide closed-form solutions to specific boundary scenarios and investigate performance for the general case using numerical optimization. We also show concatenating existing minimum processing far- and near-end enhancement methods preserves the effects of the initial methods. Results show that the joint optimization can further improve performance compared to the concatenated approach.
翻译:我们考虑对在一个噪声环境中拾取的信号进行语音增强,该信号需要在另一个噪声环境中呈现给听者。针对远端噪声抑制和近端听觉增强两方面,已有研究表明,在噪声条件良好且可懂度已达上限的情况下,过度关注噪声抑制或可懂度最大化可能导致过度的语音失真和质量下降。近期文献[1,2]提出采用最小处理框架来缓解这一问题,即在满足特定可懂度标准的前提下,仅以最小程度降低噪声或增强听觉。此外,研究还表明,联合考虑两个环境可提升语音增强性能。本文提出一个联合远端与近端的最小处理框架,能在良好噪声条件下改善可懂度同时限制语音失真。我们给出了特定边界场景的闭式解,并利用数值优化研究了通用情况下的性能。同时证明,串联现有的最小处理远端与近端增强方法可保留原始方法的效果。结果表明,相比串联方法,联合优化能进一步提升性能。