Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges. Multi-task learning is often employed to enhance SimulST performance but introduces optimization conflicts between primary and auxiliary tasks, potentially compromising overall efficiency. The existing model-level conflict resolution methods are not well-suited for this task which exacerbates inefficiencies and leads to high GPU memory consumption. To address these challenges, we propose a Modular Gradient Conflict Mitigation (MGCM) strategy that detects conflicts at a finer-grained modular level and resolves them utilizing gradient projection. Experimental results demonstrate that MGCM significantly improves SimulST performance, particularly under medium and high latency conditions, achieving a 0.68 BLEU score gain in offline tasks. Additionally, MGCM reduces GPU memory consumption by over 95\% compared to other conflict mitigation methods, establishing it as a robust solution for SimulST tasks.
翻译:同步语音翻译(SimulST)需要在连续处理流式语音输入的同时生成目标语言文本,这带来了显著的实时性挑战。为提升SimulST性能,常采用多任务学习方法,但该方法会引入主任务与辅助任务间的优化冲突,可能损害整体效率。现有模型级冲突解决方法不适用于此任务,反而加剧了效率低下问题并导致高GPU内存消耗。为应对这些挑战,我们提出了一种模块化梯度冲突缓解(MGCM)策略,该策略在更细粒度的模块层面检测冲突,并利用梯度投影方法解决冲突。实验结果表明,MGCM显著提升了SimulST性能,在中高延迟条件下尤为明显,在离线任务中实现了0.68 BLEU分数的提升。此外,与其他冲突缓解方法相比,MGCM将GPU内存消耗降低了95%以上,从而确立了其作为SimulST任务的稳健解决方案。