With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies. Therefore, the Wav2Vec models were introduced to reduce the data required to obtain state-of-the-art results. This work leverages this knowledge and improves the performance of the pre-trained speech models by simply replacing the fine-tuning dense layer with a lateral inhibition layer inspired by the biological process. Our experiments on Romanian, a low-resource language, show an average improvement of 12.5% word error rate (WER) using the lateral inhibition layer. In addition, we obtain state-of-the-art results on both the Romanian Speech Corpus and the Robin Technical Acquisition Corpus with 1.78% WER and 29.64% WER, respectively.
翻译:随着Transformer模型双向编码器表示在自然语言处理领域的兴起,语音社区借鉴了其部分开发方法论。为此提出的Wav2Vec系列模型显著降低了获得最优结果所需的数据量。本研究利用现有知识,通过将微调密集层替换为受生物过程启发的侧抑制层,直接提升了预训练语音模型的性能。针对低资源语言罗马尼亚语的实验表明,采用侧抑制层后词错误率(WER)平均改善12.5%。此外,我们在罗马尼亚语音语料库和罗宾技术采集语料库上取得了最优结果,词错误率分别为1.78%和29.64%。