The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
翻译:英国COVID-19语音音频数据集专为训练和评估机器学习模型而设计,这些模型通过语音音频对SARS-CoV-2感染状态或相关呼吸道症状进行分类。英国卫生安全局于2021年3月至2022年3月期间,通过国家检测与追踪计划和英格兰REACT-1调查招募自愿参与者,在此期间Alpha和Delta SARS-CoV-2变体以及部分Omicron变体亚系为主要传播毒株。在名为"畅所欲言,助力抗击冠状病毒"的数字调查中,收集了受试者的自发性咳嗽、呼气及语音音频记录,同时采集了人口统计学、自报症状和呼吸系统状况数据,并与SARS-CoV-2检测结果相关联。英国COVID-19语音音频数据集是迄今为止规模最大的SARS-CoV-2 PCR参考音频记录集合。在72,999名参与者中,有70,794人关联了PCR结果,25,776例阳性病例中关联了24,155例。45.62%的参与者报告了呼吸道症状。该数据集对生物声学研究具有额外应用潜力,其中11.30%参与者报告患有哮喘,27.20%参与者关联了流感PCR检测结果。