LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by reading the lips of the speaker, has been a focus of interest in the last decades. Nevertheless, in order to estimate these systems in the currently Deep Learning era, large-scale databases are required. On the other hand, while most of these databases are dedicated to English, other languages lack sufficient resources. Thus, this paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish, providing 13 hours of data extracted from Spanish television. Furthermore, baseline results for both speaker-dependent and speaker-independent scenarios are reported using Hidden Markov Models, a traditional paradigm that has been widely used in the field of Speech Technologies.

翻译：语音被视为一种多模态过程，其中听觉和视觉是两个基本支柱。事实上，多项研究表明，当结合音频和视觉线索来表征语音的本质时，自动语音识别系统的鲁棒性可以得到提升。此外，视觉语音识别作为一个开放的研究问题，其目标是通过读取说话者的唇部动作来解读语音，在过去几十年中一直备受关注。然而，在当前深度学习时代，为了评估这些系统，需要大规模的数据库。另一方面，尽管大多数此类数据库针对英语，其他语言却缺乏足够的资源。因此，本文介绍了一个半自动标注的视听数据库，用于处理不受约束的自然西班牙语，提供了从西班牙电视节目中提取的13小时数据。此外，本文还报告了使用隐马尔可夫模型（语音技术领域广泛使用的传统范式）在说话者相关和说话者无关场景下的基线结果。

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日