This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitable for deployment in low-resource devices such as wearables. We evaluate BRILLsson on eight benchmark tasks (including but not limited to spoken language identification, emotion recognition, health condition diagnosis, and keyword spotting), and demonstrate that our proposed ultra-light and low-latency models perform as well as large-scale models.
翻译:本文提出BRILLsson,一种基于二值神经网络的新型表示学习模型,适用于广泛的非语义语音任务。我们通过知识蒸馏技术,从大规模实值TRILLsson模型中训练该模型,且仅使用了训练TRILLsson所用数据集的一小部分。最终得到的BRILLsson模型大小仅为2MB,延迟低于8ms,适合部署于可穿戴设备等低资源设备中。我们在八个基准任务(包括但不限于语种识别、情感识别、健康状况诊断和关键词唤醒)上评估了BRILLsson,结果表明我们提出的超轻量低延迟模型性能可与大规模模型相媲美。