In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
翻译:本研究提出了一种新颖的对抗性重编程方法用于低资源口语命令识别,并构建了AR-SCR系统。AR过程旨在修改(目标领域的)声学信号,以重新利用(源领域的)预训练SCR模型。为解决源领域与目标领域之间的标签不匹配问题,并进一步提高AR的稳定性,我们提出了一种基于相似性的标签映射技术来实现类别对齐。此外,将迁移学习技术与原始AR过程相结合,以提升模型的自适应能力。我们在三个低资源SCR数据集上评估了所提出的AR-SCR系统,包括阿拉伯语、立陶宛语和构音障碍型普通话语音。实验结果表明,当使用在大规模英语数据集上预训练的声学模型时,所提出的AR-SCR系统在仅使用有限训练数据的情况下,在阿拉伯语和立陶宛语语音命令数据集上均取得了优于当前最优方法的结果。