Infant crying can serve as a crucial indicator of various physiological and emotional states. This paper introduces a comprehensive approach for detecting infant cries within audio data. We integrate Meta's Wav2Vec with traditional audio features, such as Mel-frequency cepstral coefficients (MFCCs), chroma, and spectral contrast, employing Gradient Boosting Machines (GBM) for cry classification. We validate our approach on a real-world dataset, demonstrating significant performance improvements over existing methods.
翻译:婴儿啼哭可作为多种生理与情绪状态的关键指标。本文提出一种从音频数据中检测婴儿啼哭的综合方法。我们将Meta的Wav2Vec模型与梅尔频率倒谱系数(MFCCs)、色度特征及谱对比度等传统音频特征相结合,并采用梯度提升机(GBM)进行啼哭分类。我们在真实数据集上验证了该方法,结果表明其性能较现有方法有显著提升。