Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research.
翻译:自动语音识别(ASR)近期已成为深度学习(DL)应用中的重要挑战。它需要大规模训练数据集以及高昂的计算和存储资源。此外,深度学习技术与一般机器学习(ML)方法均假设训练数据和测试数据来自同一领域,具有相同的输入特征空间和数据分布特性。然而,这一假设在某些现实世界的人工智能(AI)应用中并不成立。同时,存在采集真实数据困难、成本高昂或数据罕见的情况,难以满足深度学习模型的数据需求。深度迁移学习(DTL)的提出旨在克服这些问题,它能够利用与训练数据相关但规模较小或存在差异的真实数据集,构建高性能模型。本文对基于深度迁移学习的自动语音识别框架进行了全面综述,以揭示最新进展并帮助学者与专业人士理解当前挑战。具体而言,本文在介绍深度迁移学习背景后,采用精心设计的分类体系概述现有技术现状;随后通过批判性分析识别各框架的局限性与优势;进而开展比较研究以突出当前挑战,并据此提出未来研究方向。