Spoken language change detection (LCD) refers to identifying the language transitions in a code-switched utterance. Similarly, identifying the speaker transitions in a multispeaker utterance is known as speaker change detection (SCD). Since tasks-wise both are similar, the architecture/framework developed for the SCD task may be suitable for the LCD task. Hence, the aim of the present work is to develop LCD systems inspired by SCD. Initially, both LCD and SCD are performed by humans. The study suggests humans require (a) a larger duration around the change point and (b) language-specific prior exposure, for performing LCD as compared to SCD. The larger duration requirement is incorporated by increasing the analysis window length of the unsupervised distance-based approach. This leads to a relative performance improvement of 29.1% and 2.4%, and a priori language knowledge provides a relative improvement of 31.63% and 14.27% on the synthetic and practical codeswitched datasets, respectively. The performance difference between the practical and synthetic datasets is mostly due to differences in the distribution of the monolingual segment duration.
翻译:口语语言变化检测(LCD)旨在识别语码转换话语中的语言转换点。类似地,在多说话人话语中识别说话人转换点被称为说话人变化检测(SCD)。由于两者在任务上具有相似性,为SCD任务开发的架构/框架可能适用于LCD任务。因此,本研究的目的是开发受SCD启发的LCD系统。初始阶段,人类可同时执行LCD与SCD。研究表明,相较SCD,人类在执行LCD时需要(a)更长的变化点周围时长和(b)语言特定的先验暴露。通过增加无监督距离方法中分析窗口长度来满足更长时间段需求,这使合成和实际语码转换数据集上的相对性能分别提升29.1%和2.4%;而先验语言知识则分别带来31.63%和14.27%的相对改进。实际与合成数据集之间的性能差异主要源于单语片段时长分布的差异。