For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting (LRDWWS) Challenge, we introduce the PB-LRDWWS system. This system combines a dysarthric speech content feature extractor for prototype construction with a prototype-based classification method. The feature extractor is a fine-tuned HuBERT model obtained through a three-stage fine-tuning process using cross-entropy loss. This fine-tuned HuBERT extracts features from the target dysarthric speaker's enrollment speech to build prototypes. Classification is achieved by calculating the cosine similarity between the HuBERT features of the target dysarthric speaker's evaluation speech and prototypes. Despite its simplicity, our method demonstrates effectiveness through experimental results. Our system achieves second place in the final Test-B of the LRDWWS Challenge.
翻译:针对SLT 2024低资源构音障碍唤醒词检测挑战,我们提出了PB-LRDWWS系统。该系统结合了用于原型构建的构音障碍语音内容特征提取器与基于原型的分类方法。特征提取器是一个通过三阶段交叉熵损失微调得到的HuBERT模型。该微调后的HuBERT从目标构音障碍说话人的注册语音中提取特征以构建原型。分类通过计算目标构音障碍说话人评估语音的HuBERT特征与原型之间的余弦相似度来实现。尽管方法简洁,我们的实验结果表明了其有效性。我们的系统在LRDWWS挑战的最终Test-B中获得了第二名。