Conversational automatic speech recognition in Hungarian is constrained by the limited amount of publicly available dialogue-style training data. The BEA-Dialogue corpus addresses this need, but its strictly speaker-disjoint train/dev/eval split reduces the usable material to only 85 hours. In this paper, we introduce BEA-Dialogue+, an expanded version of the corpus that relaxes the split criterion for experimenters and dialogue partners while preserving complete separation of the primary speakers. This results in 200 hours of transcribed natural conversations and enables a controlled study of the trade-off between additional training data and speaker overlap across the splits. We evaluate several Whisper- and FastConformer-based models on both corpus versions, including Serialized Output Training (SOT)-based fine-tuning for dialogue transcription. Our results show that the larger corpus is more challenging for models without fine-tuning, whereas SOT-based adaptation yields consistent improvements in WER, CER, cpWER, and cpCER. Overall, BEA-Dialogue+ provides a substantially larger yet still demanding benchmark for Hungarian dialogue ASR, and a practical resource for training and evaluating dialogue transcription systems.
翻译:匈牙利语对话式自动语音识别受限于公开可用的对话风格训练数据数量有限。BEA-Dialogue语料库满足了这一需求,但其严格按说话人划分的训练/开发/测试集分割方式将可用材料减少至仅85小时。本文介绍了BEA-Dialogue+,这是该语料库的扩展版本,它放宽了对实验者和对话伙伴的分割标准,同时保持了主要说话人的完全分离。由此获得了200小时转录的自然对话,并能够对分割中额外训练数据与说话人重叠之间的权衡进行受控研究。我们在两个语料库版本上评估了多个基于Whisper和FastConformer的模型,包括基于序列化输出训练(SOT)的对话转录微调方法。结果表明,未经微调的模型在处理更大语料库时更具挑战性,而基于SOT的适应方法在WER、CER、cpWER和cpCER指标上均取得了一致的改进。总体而言,BEA-Dialogue+为匈牙利语对话式语音识别提供了一个规模更大但仍具挑战性的基准测试,以及一个用于训练和评估对话转录系统的实用资源。