Speech representations from self-supervised speech models (S3Ms) are known to be sensitive to phonemic contrasts, but their sensitivity to prosodic contrasts has not been directly measured. The ABX discrimination task has been used to measure phonemic contrast in S3M representations via minimal pairs. We introduce prosodic ABX, an extension of this framework to evaluate prosodic contrast with only a handful of examples and no explicit labels. Also, we build and release a dataset of English and Japanese minimal pairs and use it along with a Mandarin dataset to evaluate contrast in English stress, Japanese pitch accent, and Mandarin tone. Finally, we show that model and layer rankings are often preserved across several experimental conditions, making it practical for low-resource settings.
翻译:自监督语音模型(S3Ms)的语音表征已知对音位对比敏感,但其对韵律对比的敏感性尚未被直接量化。ABX辨别任务曾通过最小对立体来测量S3M表征中的音位对比。我们提出韵律ABX——该框架的扩展版本,仅需少量示例且无需显式标签即可评估韵律对比。同时,我们构建并发布了英语和日语最小对立体数据集,并联合普通话数据集,用于评估英语重音、日语音高重音和普通话声调的对比。最后,我们证明模型与层级的排序在多项实验条件下保持稳定,使其特别适用于低资源场景。