Accurate 3D hand pose and pressure sensing is essential for immersive human-computer interaction, yet simultaneously achieving both in mobile scenarios remains a significant challenge. We present WristPP, a camera-based wrist-worn system that estimates 3D hand pose and per-vertex pressure from a single wide-FOV RGB frame in real time. A Vision Transformer (ViT) backbone with joint-aligned tokens predicts Hand-VQVAE codebook indices for mesh recovery, while an extrinsics-conditioned branch jointly estimates per-vertex pressure. On a self-collected dataset of 133,000 frames (20 subjects; 48 on-plane and 28 mid-air gestures), WristPP attains a Mean Per-Joint Position Error (MPJPE) of 2.9 mm, Contact IoU of 0.712, Volumetric IoU of 0.618, and foreground pressure MAE of 10.4 g. Across three user studies, WristPP delivers touchpad-level efficiency in mid-air pointing and robust multi-finger pressure control on an uninstrumented desktop. In a real-world large-display Whac-A-Mole task, WristPP also enables higher success ratio and lower arm fatigue than head-mounted camera-based baselines. These results position WristPP as an effective, mobile solution for versatile pose- and pressure-based interaction. Website: https://zhenqis123.github.io/WristPP/.
翻译:精确的3D手部姿态与压力感知对于沉浸式人机交互至关重要,然而在移动场景中同时实现这两者仍是一个重大挑战。我们提出了WristPP,一种基于摄像头的腕戴式系统,能够从单帧宽视场RGB图像中实时估计3D手部姿态及逐顶点压力。该系统采用具有关节对齐令牌的Vision Transformer (ViT)主干网络预测用于网格恢复的Hand-VQVAE码本索引,同时通过外参条件分支联合估计逐顶点压力。在包含133,000帧数据(20名受试者;48种平面手势与28种空中手势)的自采集数据集上,WristPP实现了2.9毫米的平均关节位置误差(MPJPE)、0.712的接触交并比、0.618的体积交并比以及10.4克的前景压力平均绝对误差。在三项用户研究中,WristPP在非仪器化桌面上实现了空中指向的触摸板级效率以及稳健的多指压力控制。在真实世界的大屏幕“打地鼠”任务中,与基于头戴式摄像头的基线方法相比,WristPP还实现了更高的成功率与更低的手臂疲劳度。这些结果表明WristPP是一种适用于多样化姿态与压力交互的高效移动解决方案。项目网站:https://zhenqis123.github.io/WristPP/。