Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

WiFi-based 3D human pose estimation offers a low-cost and privacy-preserving alternative to vision-based systems for smart interaction. However, existing approaches rely on visual 3D poses as supervision and directly regress CSI to a camera-based coordinate system. We find that this practice leads to coordinate overfitting: models memorize deployment-specific WiFi transceiver layouts rather than only learning activity-relevant representations, resulting in severe generalization failures. To address this challenge, we present PerceptAlign, the first geometry-conditioned framework for WiFi-based cross-layout pose estimation. PerceptAlign introduces a lightweight coordinate unification procedure that aligns WiFi and vision measurements in a shared 3D space using only two checkerboards and a few photos. Within this unified space, it encodes calibrated transceiver positions into high-dimensional embeddings and fuses them with CSI features, making the model explicitly aware of device geometry as a conditional variable. This design forces the network to disentangle human motion from deployment layouts, enabling robust and, for the first time, layout-invariant WiFi pose estimation. To support systematic evaluation, we construct the largest cross-domain 3D WiFi pose estimation dataset to date, comprising 21 subjects, 5 scenes, 18 actions, and 7 device layouts. Experiments show that PerceptAlign reduces in-domain error by 12.3% and cross-domain error by more than 60% compared to state-of-the-art baselines. These results establish geometry-conditioned learning as a viable path toward scalable and practical WiFi sensing.

翻译：基于WiFi的三维人体姿态估计为智能交互提供了一种低成本且保护隐私的替代方案，相较于基于视觉的系统。然而，现有方法依赖于视觉三维姿态作为监督信号，并直接将信道状态信息回归到基于摄像头的坐标系中。我们发现，这种做法会导致坐标过拟合：模型记忆的是特定部署场景下的WiFi收发器布局，而非仅学习与活动相关的表征，从而导致严重的泛化失败。为应对这一挑战，我们提出了PerceptAlign，这是首个用于基于WiFi的跨布局姿态估计的几何条件化框架。PerceptAlign引入了一种轻量级的坐标统一流程，仅需两个棋盘格和少量照片，即可将WiFi与视觉测量数据对齐到一个共享的三维空间中。在此统一空间内，它将校准后的收发器位置编码为高维嵌入，并将其与信道状态信息特征融合，使模型能够将设备几何结构作为条件变量明确感知。这种设计迫使网络将人体运动与部署布局解耦，从而实现了鲁棒的、且首次实现布局无关的WiFi姿态估计。为支持系统性评估，我们构建了迄今为止最大的跨域三维WiFi姿态估计数据集，包含21名受试者、5个场景、18种动作和7种设备布局。实验表明，与最先进的基线方法相比，PerceptAlign将域内误差降低了12.3%，并将跨域误差降低了60%以上。这些结果表明，几何条件化学习是实现可扩展和实用化WiFi传感的一条可行路径。