In 2022, the U.S. National Institute of Standards and Technology (NIST) conducted the latest Language Recognition Evaluation (LRE) in an ongoing series administered by NIST since 1996 to foster research in language recognition and to measure state-of-the-art technology. Similar to previous LREs, LRE22 focused on conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) data. LRE22 also introduced new evaluation features, such as an emphasis on African languages, including low resource languages, and a test set consisting of segments containing between 3s and 35s of speech randomly sampled and extracted from longer recordings. A total of 21 research organizations, forming 16 teams, participated in this 3-month long evaluation and made a total of 65 valid system submissions to be evaluated. This paper presents an overview of LRE22 and an analysis of system performance over different evaluation conditions. The evaluation results suggest that Oromo and Tigrinya are easier to detect while Xhosa and Zulu are more challenging. A greater confusability is seen for some language pairs. When speech duration increased, system performance significantly increased up to a certain duration, and then a diminishing return on system performance is observed afterward.
翻译:2022年,美国国家标准与技术研究院(NIST)开展了自1996年以来持续举办的语种识别评测(LRE)系列中的最新一期,旨在促进语种识别研究并衡量当前最先进技术水平。与以往的LRE相同,LRE22主要关注电话会话语音(CTS)和广播窄带语音(BNBS)数据。LRE22还引入了新的评估特征,例如重点涵盖包括低资源语种在内的非洲语言,以及由从长录音中随机采样并提取的3秒至35秒语音片段构成的测试集。共有21家研究机构组成16支团队参与了为期三个月的评测,并提交了65个有效的系统以供评估。本文概述了LRE22,并分析了系统在不同评测条件下的性能表现。评测结果表明,奥罗莫语和提格雷尼亚语更易于检测,而科萨语和祖鲁语则更具挑战性。部分语种对之间存在更高的混淆性。当语音时长增加时,系统性能显著提升至特定时长阈值,之后性能提升呈现出边际递减效应。