For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSIC to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSIC is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. Specifically, we tailor a Transformer diffusion model with a two-stage training strategy for this novel tracking task. The IMUSIC framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSIC approach. Notably, IMUSIC enables various potential and novel applications, i.e., privacy-protecting facial capture, hybrid capture against occlusions, or detecting minute facial movements that are often invisible through visual cues. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.
翻译:在面部动作捕捉与分析领域,主流方法通常基于视觉线索,这些方法无法保护隐私且易受遮挡影响。惯性测量单元(IMU)作为潜在的解决方案,目前主要应用于全身动作捕捉。本文提出IMUSE以填补这一空白——一种仅使用IMU信号进行面部表情捕捉的全新路径,与以往视觉方案存在显著差异。IMUSE的核心设计包含三部曲:首先设计适用于面部捕捉的微型IMU,并配合解剖学驱动的IMU布局方案;其次贡献了新颖的IMU-ARKit数据集,为多样化的面部表情与表演提供丰富的配对IMU/视觉信号。这种独特的跨模态特性为未来研究方向(如基于IMU的面部行为分析)带来巨大潜力。此外,利用IMU-ARKit数据集,我们提出了从纯IMU信号精准预测面部混合形状参数的强基线方法。具体而言,针对这项新型追踪任务,我们采用两阶段训练策略定制了Transformer扩散模型。IMUSE框架使我们能够在视觉方法失效的场景中实现精确面部捕捉,同时保障用户隐私。我们通过大量实验验证了IMU配置与技术组件的有效性。值得注意的是,IMUSE能够支持多种潜在的新型应用,例如隐私保护型面部捕捉、抗遮挡混合捕捉,以及检测视觉线索通常难以察觉的细微面部运动。我们将公开数据集与实现代码,以丰富学界在面部捕捉与分析领域的可能性。