Determining the head orientation of a talker is not only beneficial for various speech signal processing applications, such as source localization or speech enhancement, but also facilitates intuitive voice control and interaction with smart environments or modern car assistants. Most approaches for head orientation estimation are based on visual cues. However, this requires camera systems which often are not available. We present an approach which purely uses audio signals captured with only a few distributed microphones around the talker. Specifically, we propose a novel method that directly incorporates measured or modeled speech radiation patterns to infer the talker's orientation during active speech periods based on a cosine similarity measure. Moreover, an automatic gain adjustment technique is proposed for uncalibrated, irregular microphone setups, such as ad-hoc sensor networks. In experiments with signals recorded in both anechoic and reverberant environments, the proposed method outperforms state-of-the-art approaches, using either measured or modeled speech radiation patterns.
翻译:确定说话者的头部朝向不仅有益于各种语音信号处理应用(如声源定位或语音增强),还能促进与智能环境或现代车载助手的直观语音控制和交互。大多数头部朝向估计方法基于视觉线索,但这需要通常不具备的摄像系统。我们提出了一种仅使用分布在说话者周围的少量麦克风采集的纯音频信号的方法。具体而言,我们提出了一种新方法,该方法直接利用测量或建模的语音辐射模式,基于余弦相似度度量在活跃说话期间推断说话者的朝向。此外,针对未校准、不规则的麦克风布置(如自组传感器网络),提出了一种自动增益调整技术。在与消声和混响环境中记录的信号进行的实验中,所提出的方法在采用测量或建模的语音辐射模式时均优于现有最先进方法。