When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.
翻译:当翻译指代说话者的词汇时,语音翻译(ST)系统既不应默认采用阳性泛指形式,也不应依赖可能产生误导的声线特征;而应根据说话者的偏好分配性别。现有解决方案虽有效,但需利用带有性别标注的ST数据重新训练专用模型,在实际应用中难以实现。为克服这些局限,我们首次提出在推理阶段控制ST中说话者相关性别词形变化的方案。该方法用性别特异的外部语言模型(LM)部分替代ST解码器隐式学习到的(有偏)内部语言模型。在英→西/法/意实验表明,对于阴性形式,我们的方案在性别准确率上分别领先基线模型和最优训练时缓解策略最多31.0和1.6个百分点。在说话者声线特征与其性别相矛盾的挑战性条件下,性能提升更为显著(分别达32.0和3.4个百分点)。