The practical deployment gap -- transitioning from controlled multi-view 3D skeleton capture to unconstrained monocular 2D pose estimation -- introduces a compound domain shift whose safety implications remain critically underexplored. We present a systematic study of this severe domain shift using a novel Gym2D dataset (style/viewpoint shift) and the UCF101 dataset (semantic shift). Our Skeleton Transformer achieves 63.2% cross-subject accuracy on NTU-120 but drops to 1.6% under zero-shot transfer to the Gym domain and 1.16% on UCF101. Critically, we demonstrate that high Out-Of-Distribution (OOD) detection AUROC does not guarantee safe selective classification. Standard uncertainty methods fail to detect this performance drop: the model remains confidently incorrect with 99.6% risk even at 50% coverage across both OOD datasets. While energy-based scoring (AUROC >= 0.91) and Mahalanobis distance provide reliable distributional detection signals, such high AUROC scores coexist with poor risk-coverage behavior when making decisions. A lightweight finetuned gating mechanism restores calibration and enables graceful abstention, substantially reducing the rate of confident wrong predictions. Our work challenges standard deployment assumptions, providing a principled safety analysis of both semantic and geometric skeleton recognition deployment.
翻译:实际部署差距——从受控的多视角三维骨架捕捉过渡到无约束的单目二维姿态估计——引入了一种复合领域偏移,其安全影响仍严重缺乏研究。我们通过新颖的Gym2D数据集(风格/视角偏移)和UCF101数据集(语义偏移)对这种严重领域偏移进行了系统研究。我们的Skeleton Transformer在NTU-120数据集上实现了63.2%的跨主体准确率,但在零样本迁移至健身房领域时骤降至1.6%,在UCF101上仅为1.16%。关键的是,我们证明了高分布外检测AUROC并不能保证安全的选择性分类。标准不确定性方法未能检测到这种性能下降:模型在两种OOD数据集上即使覆盖率为50%时,仍以99.6%的风险保持自信的错误预测。虽然基于能量的评分(AUROC≥0.91)和马氏距离提供了可靠的分布检测信号,但在实际决策时,这种高AUROC分数与较差的风险-覆盖率行为共存。轻量级微调的门控机制恢复了校准能力并实现了优雅的弃权决策,显著降低了自信错误预测的比例。我们的研究挑战了标准部署假设,为语义和几何骨架识别部署提供了原则性的安全分析。