Recent acoustic-to-articulatory inversion (AAI) models rely on electromagnetic articulography (EMA) data, which are costly and limited in scale. To address this limitation, we propose \textit{ArtBoost}, a novel data augmentation strategy that leverages large-scale speech--mesh datasets originally developed for speech-driven 3D facial animation to improve AAI under limited EMA supervision. \textit{ArtBoost} extracts pseudo articulatory trajectories from visible facial anchors and uses them for pre-training before fine-tuning on real EMA data. Experiments show consistent improvements in PCC and RMSE. Trajectory analyses confirm that the pseudo articulatory signals reflect physically meaningful visible articulatory dynamics. Additional evaluations across different AAI architectures demonstrate stable performance gains, indicating that \textit{ArtBoost} can be integrated into diverse AAI models. These results suggest that speech--mesh data provide an effective and scalable source of articulatory supervision for AAI. Project page: https://cau-irislab.github.io/Interspeech26-ArtBoost/
翻译:摘要:当前的声学-发音逆映射(AAI)模型依赖成本高昂且规模有限的电磁发音描记(EMA)数据。为克服这一限制,我们提出了一种新颖的数据增强策略——\textit{ArtBoost},通过利用原本为语音驱动三维面部动画开发的大规模语音-网格数据集,在有限EMA监督下提升AAI性能。\textit{ArtBoost}从可见面部锚点中提取伪发音轨迹,用于预训练阶段,随后在真实EMA数据上进行微调。实验表明,该方法在皮尔逊相关系数(PCC)和均方根误差(RMSE)上取得了一致性改进。轨迹分析证实,伪发音信号反映了具有物理意义的可见发音动态特征。不同AAI架构下的额外评估展现了稳定的性能提升,表明\textit{ArtBoost}可集成至多种AAI模型。这些结果揭示,语音-网格数据为AAI提供了有效且可扩展的发音监督来源。项目页面:https://cau-irislab.github.io/Interspeech26-ArtBoost/