Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models. We experiment with several different algorithms such as Hidden Markov Model with Gaussian Mixture, Convolutional Neural Networks and variants of Recurrent Neural Networks including Long Short-Term Memory and the Attention mechanism. In our experiments, RNN with BiLSTM and Attention achieves the best performance with an accuracy of 93.9 %
翻译:语音识别已成为机器学习和人工智能发展中的重要任务。本研究利用语音识别机器学习与深度学习技术,探索了关键词识别这一关键任务。我们通过将原始波形转换为梅尔频率倒谱系数(MFCCs)实现特征工程,并将其作为模型输入。我们实验了多种算法,包括高斯混合隐马尔可夫模型、卷积神经网络以及循环神经网络变体(含长短期记忆与注意力机制)。实验结果表明,采用双向LSTM与注意力机制的循环神经网络取得了最佳性能,准确率达93.9%。