We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date. Application areas include automatic speech recognition (ASR), text-to-speech, dialect identification, and speaker recognition. Dialect information, age group, and gender of the 316 speakers are provided. Genders are equally represented and the corpus includes speakers of all ages. Roughly the same amount of speech is provided per dialect region, which makes the corpus ideally suited for experiments with speech technology for different dialects. We provide training, validation, and test splits of the data. The test set consists of the same spoken sentences for each dialect region and allows a fair evaluation of the quality of speech technologies in different dialects. We train an ASR model on the training set and achieve an average BLEU score of 74.7 on the test set. The model beats the best published BLEU scores on 2 other Swiss German ASR test sets, demonstrating the quality of the corpus.
翻译:我们提出了STT4SG-350(瑞士德语语音转文本)语料库,该语料库包含瑞士德语语音数据,并附有句子级别的标准德语文本标注。数据通过一款网络应用程序收集,其中说话者被展示标准德语句子,并将其翻译成瑞士德语后进行录音。我们公开提供了该语料库。它包含来自所有方言区域的343小时语音,是迄今为止最大的瑞士德语公共语音语料库。应用领域包括自动语音识别(ASR)、文本转语音、方言识别和说话人识别。语料库提供了316位说话者的方言信息、年龄段和性别。性别比例均衡,且涵盖所有年龄段的说话者。每个方言区域提供的语音量大致相同,这使得该语料库非常适合用于不同方言的语音技术实验。我们提供了训练集、验证集和测试集的数据划分。测试集包含每个方言区域相同的口语化句子,从而允许对不同方言的语音技术质量进行公平评估。我们在训练集上训练了一个ASR模型,并在测试集上取得了74.7的平均BLEU分数。该模型在其他两个瑞士德语ASR测试集上超越了已发表的最佳BLEU分数,从而证明了该语料库的质量。