In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling.

翻译：远程会议近期广泛普及，但失真的或含糊不清的语音通信问题仍普遍存在。音频增强可通过应用噪声抑制技术补偿来自例如小型真无线耳塞的低质量输入信号。此类处理依赖于低延迟的语音活动检测（VAD）并附加识别佩戴者语音与他人语音的能力——这是一项计算复杂度高的任务。然而，现代耳机等小型设备的严苛能耗预算要求任何尝试解决该问题的系统均以极低的功耗和处理开销运行，同时不能因可用性问题依赖特定说话人的语音样本和训练。本文基于新型商用微机电系统（MEMS）骨传导麦克风，设计并实现了一个用于低功耗无线耳塞的定制研究平台。此类麦克风能以更高的隔离度记录佩戴者的语音，从而实现个性化语音活动检测及进一步的音频增强应用。此外，本文准确评估了一种基于骨传导数据和递归神经网络的低功耗个性化语音检测算法，该算法在实现的研究平台上运行。该算法与传统麦克风输入方法进行了对比。系统性能评估表明，骨传导系统可在12.8毫秒内以95%的准确率检测语音。研究对比了不同片上系统（SoC）方案，最终采用前沿的Ambiq Apollo 4 Blue SoC实现了每次推理2.64毫瓦平均功耗（14微焦耳），在无占空比调度的条件下，基于微型32毫安时锂离子电池可提供43小时续航。