While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality. To reduce the costs of these essential controls, this paper investigates the use of Speech Foundation Models (SFMs) to automate the validation process, examining for the first time the cost/quality trade-off in data acquisition. Experiments conducted on French, German, and Korean data demonstrate that SFM-based validation has the potential to reduce reliance on human validation, resulting in an estimated cost saving of over 40.0% without degrading final data quality. These findings open new opportunities for more efficient, cost-effective, and scalable speech data acquisition.
翻译:尽管群体众包已成为促进和扩展语音数据采集的成熟解决方案,但非专业人士的参与需要建立相应协议以确保最终数据质量。为降低这些必要控制环节的成本,本文研究了利用语音基础模型(SFMs)实现验证过程自动化的方法,首次系统考察了数据采集中成本与质量间的权衡关系。在法语、德语和韩语数据上进行的实验表明,基于SFM的验证能够有效减少对人类验证的依赖,在保持最终数据质量不降低的前提下,预计可节约超过40.0%的成本。这些发现为更高效、经济且可扩展的语音数据采集开辟了新的可能性。