Voice activity detection (VAD) plays a vital role in enabling applications such as speech recognition. We analyze the impact of window size on the accuracy of three VAD algorithms: Silero, WebRTC, and Root Mean Square (RMS) across a set of diverse real-world digital audio streams. We additionally explore the use of hysteresis on top of each VAD output. Our results offer practical references for optimizing VAD systems. Silero significantly outperforms WebRTC and RMS, and hysteresis provides a benefit for WebRTC.
翻译:语音活动检测(VAD)在语音识别等应用中发挥着至关重要的作用。我们分析了窗口大小对三种VAD算法(Silero、WebRTC和均方根(RMS))在一组多样化真实数字音频流上准确性的影响。此外,我们还探讨了在每种VAD输出基础上应用迟滞的效果。我们的结果为优化VAD系统提供了实用参考。Silero显著优于WebRTC和RMS,而迟滞对WebRTC有益。