构建面向印度语言的鲁棒且可扩展的多语言自动语音识别系统 (Building Robust and Scalable Multilingual ASR for Indian Languages) - 专知论文

会员服务 ·

0

系统 · 识别 · 构建 · 识别系统 · 音素 ·

2025 年 11 月 19 日

Building Robust and Scalable Multilingual ASR for Indian Languages

翻译：构建面向印度语言的鲁棒且可扩展的多语言自动语音识别系统

Arjun Gangwar,Kaousheik Jayakumar,S. Umesh

This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest language ID and dialect ID accuracy among all participating teams (Track 2).

翻译：本文介绍了印度马德拉斯理工学院SPRING实验室为ASRU MADASR 2.0挑战赛开发的系统。该系统专注于改进自动语音识别系统，以提升其在33种方言、涵盖8种语言的语音中预测语言和方言的能力。我们参与了赛道1和赛道2，这两个赛道限制使用额外数据，并要求从头构建多语言系统。我们提出了一种新颖的训练方法，采用多解码器架构，并以音素通用标签集作为中间表示。该方法在基线模型（在CLS空间内）的基础上提升了性能。我们还讨论了多种方法，用于在将音素表示转换回对应字素表示时，保留在音素空间中获得的效果增益。我们的系统在3种语言（赛道2）上以词错误率/字错误率指标超越了基线，并在所有参赛队伍中（赛道2）取得了最高的语言识别和方言识别准确率。

0

相关内容

OpenAI GPT 4.5 报告（中英文版）

OpenAI GPT 4.5 报告（中英文版）

专知会员服务

40+阅读 · 2025年3月1日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

专知会员服务

10+阅读 · 2019年12月3日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

预知未来——Gluon 时间序列工具包（GluonTS）

预知未来——Gluon 时间序列工具包（GluonTS）

ApacheMXNet

24+阅读 · 2019年6月25日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

专知

15+阅读 · 2018年1月16日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

面向汉语-泰语跨语言新闻事件检索方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

Qomhra: A Bilingual Irish and English Large Language Model

Qomhra: A Bilingual Irish and English Large Language Model

Arxiv

0+阅读 · 1月6日

SPO-CLAPScore: Enhancing CLAP-based alignment prediction system with Standardize Preference Optimization, for the first XACLE Challenge

Arxiv

0+阅读 · 1月6日

OpenSocInt: A Multi-modal Training Environment for Human-Aware Social Navigation

Arxiv

0+阅读 · 1月5日

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Arxiv

0+阅读 · 1月5日

Learning Speech Representations with Variational Predictive Coding

Arxiv

0+阅读 · 2025年12月31日

VIP会员

文章信息

相关主题

相关VIP内容

OpenAI GPT 4.5 报告（中英文版）

OpenAI GPT 4.5 报告（中英文版）

专知会员服务

40+阅读 · 2025年3月1日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

专知会员服务

10+阅读 · 2019年12月3日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体评判者（Agent-as-a-Judge）研究综述

《空战中心自动化持续训练》报告

区块链自主智能体：标准规范、执行模型与信任边界研究

面向无人机战场调整作战训练中心

相关资讯

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

预知未来——Gluon 时间序列工具包（GluonTS）

预知未来——Gluon 时间序列工具包（GluonTS）

ApacheMXNet

24+阅读 · 2019年6月25日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

专知

15+阅读 · 2018年1月16日

相关论文

Qomhra: A Bilingual Irish and English Large Language Model

Qomhra: A Bilingual Irish and English Large Language Model

Arxiv

0+阅读 · 1月6日

SPO-CLAPScore: Enhancing CLAP-based alignment prediction system with Standardize Preference Optimization, for the first XACLE Challenge

Arxiv

0+阅读 · 1月6日

OpenSocInt: A Multi-modal Training Environment for Human-Aware Social Navigation

Arxiv

0+阅读 · 1月5日

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Arxiv

0+阅读 · 1月5日

Learning Speech Representations with Variational Predictive Coding

Arxiv

0+阅读 · 2025年12月31日

相关基金

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

面向汉语-泰语跨语言新闻事件检索方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员