In this paper, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variables are split into groups and processed with a shared sequence model with the aim of reducing the complexity of a simple model only containing one convolutional and one recurrent module. With GC we are able to reduce the size of the model by up to 83 % and the complexity up to 73 % compared to the model without GC, while mostly retaining performance. Even for the smallest model configuration, GCBFSnet matches the performance of a low-complexity TasNet baseline in most metrics despite the larger size and higher number of required operations of the baseline.
翻译:本文提出一种适用于嘈杂混响环境下双耳多通道盲语者分离的因果低延迟低复杂度方法。该模型称为组通信双耳滤波求和网络(GCBFSnet),通过在时频域中预测复值滤波器实现滤波求和波束成形。我们采用组通信(GC)机制,即将潜在模型变量划分为多个组,通过共享序列模型进行处理,旨在降低仅包含一个卷积模块和一个循环模块的简单模型的复杂度。与未使用GC的模型相比,采用GC后模型尺寸最多可缩减83%,复杂度降低73%,同时性能基本保持不变。即使是最小模型配置,GCBFSnet在多数指标上仍能匹配低复杂度TasNet基线的性能,尽管基线模型规模更大且所需运算量更多。