Due to the scarcity of publicly available diarization data, the model performance can be improved by training a single model with data from different domains. In this work, we propose to incorporate domain information to train a single end-to-end diarization model for multiple domains. First, we employ domain adaptive training with parameter-efficient adapters for on-the-fly model reconfiguration. Second, we introduce an auxiliary domain classification task to make the diarization model more domain-aware. For seen domains, the combination of our proposed methods reduces the absolute DER from 17.66% to 16.59% when compared with the baseline. During inference, adapters from ground-truth domains are not available for unseen domains. We demonstrate our model exhibits a stronger generalizability to unseen domains when adapters are removed. For two unseen domains, this improves the DER performance from 39.91% to 23.09% and 25.32% to 18.76% over the baseline, respectively.
翻译:由于公开可用的说话人日志数据稀缺,通过使用来自不同域的数据训练单一模型可提升模型性能。本文提出融合域信息来训练适用于多域的单一端到端说话人日志模型。首先,我们采用参数高效适配器的域自适应训练实现模型即时重构。其次,引入辅助域分类任务增强说话人日志模型对域的感知能力。对于可见域,我们提出的方法组合使绝对DER(错误率)从17.66%降至16.59%。推理阶段,对于不可见域无法获取真实域对应的适配器。实验证明,当移除适配器时,模型展现出对不可见域更强的泛化能力。在两个不可见域上,该方法相较于基线分别将DER从39.91%改善至23.09%,以及从25.32%改善至18.76%。