Deep neural networks (DNNs) have been demonstrated to outperform many traditional machine learning algorithms in Automatic Speech Recognition (ASR). In this paper, we show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization at a very low computational cost. Phone recognition tests with the popular LibriSpeech and TIMIT benchmarks proved this fact by displaying the ability to discover and train novel candidate models within a few hours (less than a day) many times faster than the attention-based seq2seq models. Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results.
翻译:深度神经网络(DNNs)已被证明在自动语音识别(ASR)中优于许多传统机器学习算法。本文表明,通过有效的神经架构优化,可以极低的计算成本大幅提升深度语音模型的准确率。在广泛使用的LibriSpeech和TIMIT基准数据集上的音素识别测试证明了这一事实——该方法能够在数小时内(少于一天)发现并训练出新颖的候选模型,其速度比基于注意力的seq2seq模型快数倍。我们的方法在LibriSpeech语料库上实现了7%的词错误率(WER),在TIMIT语料库上实现了13%的音素错误率(PER),达到了与当前最优结果相当的水平。