Gestures performed accompanying the voice are essential for voice interaction to convey complementary semantics for interaction purposes such as wake-up state and input modality. In this paper, we investigated voice-accompanying hand-to-face (VAHF) gestures for voice interaction. We targeted hand-to-face gestures because such gestures relate closely to speech and yield significant acoustic features (e.g., impeding voice propagation). We conducted a user study to explore the design space of VAHF gestures, where we first gathered candidate gestures and then applied a structural analysis to them in different dimensions (e.g., contact position and type), outputting a total of 8 VAHF gestures with good usability and least confusion. To facilitate VAHF gesture recognition, we proposed a novel cross-device sensing method that leverages heterogeneous channels (vocal, ultrasound, and IMU) of data from commodity devices (earbuds, watches, and rings). Our recognition model achieved an accuracy of 97.3% for recognizing 3 gestures and 91.5% for recognizing 8 gestures, excluding the "empty" gesture, proving the high applicability. Quantitative analysis also sheds light on the recognition capability of each sensor channel and their different combinations. In the end, we illustrated the feasible use cases and their design principles to demonstrate the applicability of our system in various scenarios.
翻译:伴随语音执行的手势对于语音交互至关重要,可传递唤醒状态、输入模态等互补语义。本文针对语音交互中的语音伴随手-脸手势(VAHF)展开研究。我们聚焦手-脸手势,因其与语音紧密相关且能产生显著声学特征(如阻碍语音传播)。通过用户研究探索VAHF手势设计空间,首先收集候选手势,继而从接触位置、类型等维度进行结构分析,最终筛选出8种具备良好可用性与最低混淆度的VAHF手势。为便于VAHF手势识别,我们提出新型跨设备感知方法,利用商用设备(耳机、手表、戒指)的异构信道(语音、超声、惯性测量单元)数据。提出的识别模型在排除"空手势"状态下,对3种手势识别准确率达97.3%,对8种手势识别准确率达91.5%,验证了高度的适用性。量化分析还揭示了各传感器信道及其不同组合的识别能力。最后,通过演示可行用例及设计原则,展示了系统在多种场景下的应用潜力。