Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstructs speaker-discriminative features with cross-speaker interaction, enabling stage-wise refinement. To complement this architecture, we formulate speech separation as a structured correlation-to-filter problem: spatio-spectro-temporal correlations computed from the observations are used as input features, and the corresponding deep filters are estimated to recover target signals. We further incorporate an attractor-based dynamic split module to adapt the number of output streams to the actual speaker configuration. Experimental results on WSJ0-{2,3,4,5}Mix, WHAMR!, and LibriCSS demonstrate consistent improvements across anechoic, noisy-reverberant, and real-recorded conditions in both single- and multi-channel settings, highlighting the effectiveness of TF-domain SepRe with correlation-based filter estimation for speech separation.
翻译:实际声学环境中的语音分离仍面临挑战,因为需要同时解决说话人重叠、背景噪声和混响问题。尽管近年来的时频域模型表现出色,但多数仍依赖后分离架构——将说话人解耦任务推迟至最终阶段,这形成了信息瓶颈并削弱了恶劣条件下的区分能力。为解决该问题,本文提出SR-CorrNet——一种非对称编解码器框架,将分离-重建策略引入时频双路径骨干网络:编码器从混合观测中执行粗分离,而共享权重的解码器通过跨说话人交互逐步重建说话人判别特征,实现逐阶段优化。为配合该架构,我们将语音分离形式化为结构化相关-滤波问题:利用观测数据计算的空间-频谱-时间相关性作为输入特征,通过估计对应的深度滤波器来恢复目标信号。此外,我们引入基于吸引子的动态分裂模块,使输出流数量自适应于实际说话人配置。在WSJ0-{2,3,4,5}Mix、WHAMR!和LibriCSS数据集上的实验结果表明,无论在单通道还是多通道设置中,该方法在消声、噪声-混响及真实录音条件下均取得一致改进,验证了基于相关滤波估计的时频域分离-重建策略在语音分离中的有效性。