Parameter-Efficient Fine-Tuning (PEFT) is increasingly recognized as an effective method in speech processing. However, the optimal approach and the placement of PEFT methods remain inconclusive. Our study conducts extensive experiments to compare different PEFT methods and their layer-wise placement adapting Differentiable Architecture Search (DARTS). We also explore the use of ensemble learning to leverage diverse PEFT strategies. The results reveal that DARTS does not outperform the baseline approach, which involves inserting the same PEFT method into all layers of a Self-Supervised Learning (SSL) model. In contrast, an ensemble learning approach, particularly one employing majority voting, demonstrates superior performance. Our statistical evidence indicates that different PEFT methods learn in varied ways. This variation might explain why the synergistic integration of various PEFT methods through ensemble learning can harness their unique learning capabilities more effectively compared to individual layer-wise optimization.
翻译:参数高效微调(PEFT)在语音处理中被广泛认为是一种有效方法。然而,PEFT方法的最优方案及其放置位置尚无定论。本研究通过大量实验,对比了不同PEFT方法及其基于可微架构搜索(DARTS)的层级放置策略。我们还探索了利用集成学习来整合多种PEFT策略的方法。结果表明,DARTS并未超越基线方法(即在全自监督学习(SSL)模型各层中插入相同PEFT方法)。相比之下,集成学习方法(尤其是采用多数投票的方法)展现了更优性能。统计证据表明,不同PEFT方法的学习方式存在差异,这种差异可能解释了为何通过集成学习协同整合多种PEFT方法,相比逐层优化能更有效地利用其独特的学习能力。