Vision sensors provide a lightweight solution for spacecraft proximity operations, but monocular spacecraft 6D pose estimation remains difficult under illumination variation, specular reflection, shadowing, weak texture, and background interference. These factors make local visual evidence spatially unreliable and can destabilize pose regression. This article proposes a Precision-Aware Illumination-Disentangled Vision Transformer (PAID-ViT) for robust spacecraft pose estimation.The proposed model separates pose-relevant structure tokens from illumination-sensitive appearance tokens, estimates patch reliability before pose aggregation, and uses foreground mask supervision to preserve silhouette cues. A parameter-free geometric recovery module converts normalized crop coordinates, log-depth, and a continuous 6D rotation representation into camera-frame rotation and translation. Experiments on SPEED+ V2, the SPEED+ validation/lightbox/sunlamp evaluation configuration used in this study, suggest that PAID-ViT reduces translation error and improves robustness in the challenging sunlamp domain, while ablation studies support the complementary roles of illumination disentanglement, reliability-aware token aggregation, mask supervision, and training-side regularization.
翻译:视觉传感器为航天器近距离操作提供了轻量级解决方案,但单目航天器6D位姿估计在光照变化、镜面反射、阴影、弱纹理和背景干扰等条件下仍面临困难。这些因素使得局部视觉证据在空间上不可靠,并可能破坏位姿回归的稳定性。本文提出了一种面向精度的光照解耦视觉Transformer(PAID-ViT),用于鲁棒的航天器位姿估计。所提出模型将位姿相关结构标记与光照敏感外观标记分离,在位姿聚合前估计图像块可靠性,并利用前景掩码监督以保留轮廓线索。一个无参数的几何恢复模块将归一化裁剪坐标、对数深度和连续6D旋转表示转换为相机坐标系下的旋转与平移。在SPEED+ V2(本研究使用的SPEED+验证/日光灯/太阳灯评估配置)上的实验表明,PAID-ViT在具有挑战性的太阳灯域中降低了平移误差并提升了鲁棒性,而消融研究则支撑了光照解耦、可靠性感知标记聚合、掩码监督与训练侧正则化的互补作用。