Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices

The surging popularity of home assistants and their voice user interface (VUI) have made them an ideal central control hub for smart home devices. However, current form factors heavily rely on VUI, which poses accessibility and usability issues; some latest ones are equipped with additional cameras and displays, which are costly and raise privacy concerns. These concerns jointly motivate Beyond-Voice, a novel high-fidelity acoustic sensing system that allows commodity home assistant devices to track and reconstruct hand poses continuously. It transforms the home assistant into an active sonar system using its existing onboard microphones and speakers. We feed a high-resolution range profile to the deep learning model that can analyze the motions of multiple body parts and predict the 3D positions of 21 finger joints, bringing the granularity for acoustic hand tracking to the next level. It operates across different environments and users without the need for personalized training data. A user study with 11 participants in 3 different environments shows that Beyond-Voice can track joints with an average mean absolute error of 16.47mm without any training data provided by the testing subject.

翻译：家庭助手的日益普及及其语音用户界面（VUI）使其成为智能家居设备的理想中央控制枢纽。然而，当前形态设备严重依赖VUI，导致可访问性和可用性问题；部分最新设备虽配备额外摄像头和显示屏，但成本高昂且引发隐私担忧。这些挑战共同推动了Beyond-Voice这一新型高保真声学感知系统的诞生，它能实现商用家庭助手设备对手部姿态的连续追踪与重构。该系统利用设备自带的麦克风与扬声器，将家庭助手改造为主动声纳系统。通过向深度学习模型输入高分辨率距离剖面，模型可分析多身体部位的运动，并预测21个手指关节的三维位置，将声学手部追踪的粒度提升至新高度。该系统无需个性化训练数据即可跨环境与用户运行。在3种不同环境下对11名参与者的用户研究表明，即使未使用测试对象的训练数据，Beyond-Voice仍能实现关节追踪平均绝对误差16.47mm。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日