Invariance to microphone array configuration is a rare attribute in neural beamformers. Filter-and-sum (FS) methods in this class define the target signal with respect to a reference channel. However, this not only complicates formulation in reverberant conditions but also the network, which must have a mechanism to infer what the reference channel is. To address these issues, this study presents Delay Filter-and-Sum Network (DFSNet), a steerable neural beamformer invariant to microphone number and array geometry for causal speech enhancement. In DFSNet, acquired signals are first steered toward the speech source direction prior to the FS operation, which simplifies the task into the estimation of delay-and-summed reverberant clean speech. The proposed model is designed to incur low latency, distortion, and memory and computational burden, giving rise to high potential in hearing aid applications. Simulation results reveal comparable performance to noncausal state-of-the-art.
翻译:对麦克风阵列配置的保不变性是神经波束形成器中罕见的特性。此类滤波求和(FS)方法以参考通道定义目标信号,然而这不仅使混响环境下的公式化表达复杂化,还要求网络具备推断参考通道的机制。为解决这些问题,本研究提出延迟滤波求和网络(DFSNet)——一种对麦克风数量与阵列几何构型不变的因果可操纵神经波束形成器。在DFSNet中,采集信号在执行FS操作前首先被引导至语音源方向,从而将任务简化为对延迟求和混响纯净语音的估计。该模型被设计为具有低延迟、低失真以及轻量级内存与计算负担的特点,从而在助听器应用中展现出巨大潜力。仿真结果表明其性能可媲美非因果最先进方法。