This paper presents the development of a prototype Automatic Speech Recognition (ASR) system specifically designed for Bengali biomedical data. Recent advancements in Bengali ASR are encouraging, but a lack of domain-specific data limits the creation of practical healthcare ASR models. This project bridges this gap by developing an ASR system tailored for Bengali medical terms like symptoms, severity levels, and diseases, encompassing two major dialects: Bengali and Sylheti. We train and evaluate two popular ASR frameworks on a comprehensive 46-hour Bengali medical corpus. Our core objective is to create deployable health-domain ASR systems for digital health applications, ultimately increasing accessibility for non-technical users in the healthcare sector.
翻译:本文介绍了一种专门针对孟加拉语生物医学数据设计的原型自动语音识别(ASR)系统的开发。尽管孟加拉语ASR领域近期取得了令人鼓舞的进展,但领域特定数据的缺乏限制了实用医疗保健ASR模型的创建。本项目通过开发一个专门针对孟加拉语医学术语(如症状、严重程度和疾病)的ASR系统来弥合这一差距,该系统涵盖孟加拉语和锡尔赫蒂语两种主要方言。我们在一个全面的46小时孟加拉语医学语料库上训练并评估了两个流行的ASR框架。我们的核心目标是为数字健康应用创建可部署的健康领域ASR系统,最终提高医疗保健领域非技术用户的可及性。