We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.
翻译:我们提出双注意力神经偏置架构,旨在提升语音识别任务中唤醒词识别效果并优化推理时延。该架构通过利用唤醒词检测为输入音频帧选择注意力网络分支,实现运行时计算路径的动态切换。采用该方法,我们可在显著降低浮点运算定义的计算开销的同时有效提升唤醒词识别精度。基于内部脱敏数据集实验表明,所提双注意力网络可在参数量仅增加1%的情况下,将唤醒词音频帧计算开销降低90%。与基线模型相比,该架构使唤醒词F1分数相对提升16%,生僻词错误率相对降低3%。