A method is proposed for performing speech enhancement using ego-noise references with a microphone array embedded in an unmanned aerial vehicle (UAV). The ego-noise reference signals are captured with microphones located near the UAV's propellers and used in the prior knowledge multichannel Wiener filter (PK-MWF) to obtain the speech correlation matrix estimate. Speech presence probability (SPP) can be estimated for detecting speech activity from an external microphone near the speech source, providing a performance benchmark, or from one of the embedded microphones, assuming a more realistic scenario. Experimental measurements are performed in a semi-anechoic chamber, with a UAV mounted on a stand and a loudspeaker playing a speech signal, while setting three distinct and fixed propeller rotation speeds, resulting in three different signal-to-noise ratios (SNRs). The recordings obtained and made available online are used to compare the proposed method to the use of the standard multichannel Wiener filter (MWF) estimated with and without the propellers' microphones being used in its formulation. Results show that compared to those, the use of PK-MWF achieves higher levels of improvement in speech intelligibility and quality, measured by STOI and PESQ, while the SNR improvement is similar.
翻译:本文提出一种利用自噪声参考进行语音增强的方法,该方法采用嵌入无人机的麦克风阵列。自噪声参考信号通过位于无人机螺旋桨附近的麦克风捕获,并用于先验知识多通道维纳滤波器(PK-MWF)以获取语音相关矩阵估计。语音存在概率(SPP)可通过语音源附近外部麦克风检测语音活动(提供性能基准),或从嵌入式麦克风中检测(假设更真实场景)进行估计。在半消声室中进行实验测量,无人机固定于支架上,扬声器播放语音信号,同时设置三种不同且固定的螺旋桨转速,产生三种不同信噪比(SNR)。将获取并公开的录音数据用于比较所提方法与标准多通道维纳滤波器(MWF)——后者在公式化过程中分别使用和不使用螺旋桨麦克风。结果表明,相较上述方法,使用PK-MWF在语音可懂度和质量方面取得更高改善(通过STOI和PESQ测量),而SNR改善程度相近。