Hearing aids impose strict latency and power constraints that current DNN-based speech enhancement systems struggle to meet on embedded hardware. We characterize this gap by deploying both speech separation and denoising using the lightweight SuDoRM-RF++ architecture on the AMD-Xilinx Kria KV260, evaluated at FP32 and 16-bit fixed-point precision for each task. Across these configurations, first-sample latency tracks with on-chip parameter caching rather than arithmetic throughput, identifying data movement as the primary bottleneck. Precision reduction halves the model memory footprint without compromising objective speech quality. The fixed-point denoising accelerator achieves a first-sample latency of 9.7~ms, meeting the 10~ms clinical threshold, while speech separation reaches 16.0~ms. These measurements establish concrete resource requirements for embedded DNN-based speech enhancement and quantify the remaining gap to hearing aid deployment.
翻译:助听器对延迟和功耗有严格限制,而当前基于DNN的语音增强系统在嵌入式硬件上难以满足这些要求。我们通过在AMD-Xilinx Kria KV260平台上部署基于轻量级SuDoRM-RF++架构的语音分离与去噪任务,分别在FP32和16位定点精度下进行评估,从而量化这一差距。在这些配置中,首个样本的延迟主要由片上参数缓存而非算术吞吐量决定,表明数据搬运是主要瓶颈。精度减半将模型内存占用降低一半,且不影响客观语音质量。定点去噪加速器的首个样本延迟达到9.7毫秒,满足10毫秒的临床阈值,而语音分离的延迟为16.0毫秒。这些测量结果确立了嵌入式DNN语音增强的具体资源需求,并量化了助听器部署尚存的差距。