With the increasing implementation of machine learning models on edge or Internet-of-Things (IoT) devices, deploying advanced models on resource-constrained IoT devices remains challenging. Transformer models, a currently dominant neural architecture, have achieved great success in broad domains but their complexity hinders its deployment on IoT devices with limited computation capability and storage size. Although many model compression approaches have been explored, they often suffer from notorious performance degradation. To address this issue, we introduce a new method, namely Transformer Re-parameterization, to boost the performance of lightweight Transformer models. It consists of two processes: the High-Rank Factorization (HRF) process in the training stage and the deHigh-Rank Factorization (deHRF) process in the inference stage. In the former process, we insert an additional linear layer before the Feed-Forward Network (FFN) of the lightweight Transformer. It is supposed that the inserted HRF layers can enhance the model learning capability. In the later process, the auxiliary HRF layer will be merged together with the following FFN layer into one linear layer and thus recover the original structure of the lightweight model. To examine the effectiveness of the proposed method, we evaluate it on three widely used Transformer variants, i.e., ConvTransformer, Conformer, and SpeechFormer networks, in the application of speech emotion recognition on the IEMOCAP, M3ED and DAIC-WOZ datasets. Experimental results show that our proposed method consistently improves the performance of lightweight Transformers, even making them comparable to large models. The proposed re-parameterization approach enables advanced Transformer models to be deployed on resource-constrained IoT devices.
翻译:随着机器学习模型在边缘或物联网设备上的应用日益增多,在资源受限的物联网设备上部署先进模型仍然具有挑战性。Transformer模型作为当前主流的神经架构,已在广泛领域取得巨大成功,但其复杂性阻碍了其在计算能力和存储容量有限的物联网设备上的部署。尽管已有许多模型压缩方法被探索,但它们往往存在显著的性能下降问题。为解决这一问题,我们提出了一种新方法,即Transformer再参数化,以提升轻量化Transformer模型的性能。该方法包含两个过程:训练阶段的高秩分解过程与推理阶段的去高秩分解过程。在前一过程中,我们在轻量化Transformer的前馈网络前插入一个额外的线性层。假设插入的HRF层能够增强模型的学习能力。在后一过程中,辅助HRF层将与后续的FFN层合并为一个线性层,从而恢复轻量化模型的原始结构。为检验所提方法的有效性,我们在IEMOCAP、M3ED和DAIC-WOZ数据集上的语音情感识别应用中,对三种广泛使用的Transformer变体(即ConvTransformer、Conformer和SpeechFormer网络)进行了评估。实验结果表明,我们提出的方法能持续提升轻量化Transformer的性能,甚至使其可与大型模型相媲美。所提出的再参数化方法使得先进Transformer模型能够部署在资源受限的物联网设备上。