In this paper, we introduce the Fongbe to French Speech Translation Corpus (FFSTC) for the first time. This corpus encompasses approximately 31 hours of collected Fongbe language content, featuring both French transcriptions and corresponding Fongbe voice recordings. FFSTC represents a comprehensive dataset compiled through various collection methods and the efforts of dedicated individuals. Furthermore, we conduct baseline experiments using Fairseq's transformer_s and conformer models to evaluate data quality and validity. Our results indicate a score of 8.96 for the transformer_s model and 8.14 for the conformer model, establishing a baseline for the FFSTC corpus.
翻译:本文首次介绍了丰贝语到法语语音翻译语料库(FFSTC)。该语料库包含约31小时的丰贝语音频内容,并配有对应的法语转录文本。FFSTC综合了多种采集方法及多位研究人员的努力而构建。此外,我们利用Fairseq的transformer_s和conformer模型进行了基线实验,以评估数据质量与有效性。实验结果显示,transformer_s模型得分为8.96,conformer模型得分为8.14,为FFSTC语料库建立了基线标准。