Neural domain alignment for spoken language recognition based on optimal transport

Domain shift poses a significant challenge in cross-domain spoken language recognition (SLR) by reducing its effectiveness. Unsupervised domain adaptation (UDA) algorithms have been explored to address domain shifts in SLR without relying on class labels in the target domain. One successful UDA approach focuses on learning domain-invariant representations to align feature distributions between domains. However, disregarding the class structure during the learning process of domain-invariant representations can result in over-alignment, negatively impacting the classification task. To overcome this limitation, we propose an optimal transport (OT)-based UDA algorithm for a cross-domain SLR, leveraging the distribution geometry structure-aware property of OT. An OT-based discrepancy measure on a joint distribution over feature and label information is considered during domain alignment in OT-based UDA. Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment. This negative transfer degrades the performance of the adaptive model. To mitigate this issue, we introduce coupling-weighted partial optimal transport (POT) within our UDA framework for SLR, where soft weighting on the OT coupling based on transport cost is adaptively set during domain alignment. A cross-domain SLR task was used in the experiments to evaluate the proposed UDA. The results demonstrated that our proposed UDA algorithm significantly improved the performance over existing UDA algorithms in a cross-channel SLR task.

翻译：域偏移对跨域口语语言识别（SLR）的有效性构成重大挑战。为应对SLR中的域偏移问题，已有研究探索了无监督域适应（UDA）算法，这类算法无需依赖目标域中的类别标签。一种成功的UDA方法致力于学习域不变表征，以对齐域间的特征分布。然而，在域不变表征的学习过程中忽略类别结构，可能导致过度对齐，进而对分类任务产生负面影响。为解决这一局限，我们提出了一种基于最优传输（OT）的UDA算法用于跨域SLR，该算法利用了OT对分布几何结构感知的特性。在基于OT的UDA的域对齐过程中，我们考虑了基于特征和标签信息联合分布上的OT差异度量。先前的研究发现，完全对齐源域与目标域的分布可能引入负迁移，即源域中的类别或无关类别在分布对齐过程中映射到目标域中的不同类别。这种负迁移会降低自适应模型的性能。为缓解该问题，我们在用于SLR的UDA框架中引入了耦合加权部分最优传输（POT），在域对齐过程中基于传输成本对OT耦合进行自适应软加权。实验部分采用了跨域SLR任务对所提出的UDA进行性能评估。结果表明，在跨通道SLR任务中，我们提出的UDA算法相比现有UDA算法显著提升了性能。