The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multiple Indian languages. This paper presents a benchmarking framework for evaluating ASR performance in agricultural contexts across Hindi, Telugu, and Odia languages. We introduce evaluation metrics including Agriculture Weighted Word Error Rate (AWWER) and domain-specific utility scoring to complement traditional metrics. Our evaluation of 10,934 audio recordings, each transcribed by up to 10 ASR models, reveals performance variations across languages and models, with Hindi achieving the best overall performance (WER: 16.2%) while Odia presents the greatest challenges (best WER: 35.1%, achieved only with speaker diarization). We characterize audio quality challenges inherent to real-world agricultural field recordings and demonstrate that speaker diarization with best-speaker selection can substantially reduce WER for multi-speaker recordings (upto 66% depending on the proportion of multi-speaker audio). We identify recurring error patterns in agricultural terminology and provide practical recommendations for improving ASR systems in low-resource agricultural domains. The study establishes baseline benchmarks for future agricultural ASR development.
翻译:印度农业咨询服务的数字化需要具备准确转录多种印度语言中领域特定术语能力的稳健自动语音识别(ASR)系统。本文提出了一个基准测试框架,用于评估印地语、泰卢固语和奥里亚语在农业语境下的ASR性能。我们引入了包括农业加权词错误率(AWWER)和领域特定效用评分在内的评估指标,以补充传统指标。通过对10,934条音频记录(每条由多达10个ASR模型转录)的评估,我们揭示了不同语言和模型间的性能差异:印地语整体表现最佳(WER: 16.2%),而奥里亚语面临的挑战最大(最佳WER: 35.1%,且仅在采用说话人日志技术时达成)。我们描述了现实农业现场录音固有的音频质量挑战,并证明结合最佳说话人选择的说话人日志技术能显著降低多说话人录音的WER(降幅最高达66%,具体取决于多说话人音频的比例)。我们识别了农业术语中反复出现的错误模式,并为改进低资源农业领域的ASR系统提供了实用建议。本研究为未来农业ASR的发展建立了基线基准。