This paper presents a novel multi-channel speech enhancement approach, FoVNet, that enables highly efficient speech enhancement within a configurable field of view (FoV) of a smart-glasses user without needing specific target-talker(s) directions. It advances over prior works by enhancing all speakers within any given FoV, with a hybrid signal processing and deep learning approach designed with high computational efficiency. The neural network component is designed with ultra-low computation (about 50 MMACS). A multi-channel Wiener filter and a post-processing module are further used to improve perceptual quality. We evaluate our algorithm with a microphone array on smart glasses, providing a configurable, efficient solution for augmented hearing on energy-constrained devices. FoVNet excels in both computational efficiency and speech quality across multiple scenarios, making it a promising solution for smart glasses applications.
翻译:本文提出了一种新颖的多通道语音增强方法FoVNet,该方法能够在无需特定目标说话者方向信息的情况下,在智能眼镜用户的可配置视场范围内实现高效的语音增强。相较于先前工作,本方法通过结合信号处理与深度学习的混合架构,以高计算效率为设计目标,实现了对任意给定视场内所有说话者的语音增强。其神经网络组件采用超低计算量设计(约50 MMACS)。进一步采用多通道维纳滤波器和后处理模块以提升感知质量。我们在智能眼镜的麦克风阵列上对所提算法进行了评估,为能量受限设备上的增强听觉提供了一个可配置的高效解决方案。FoVNet在多种场景下均展现出卓越的计算效率与语音质量,使其成为智能眼镜应用领域极具前景的解决方案。