Vision-based perception is fundamental to Space Situational Awareness and autonomous on-orbit operations such as rendezvous, docking, servicing, and navigation. However, progress in this area is limited by the scarcity of annotated space imagery and by challenging visual-domain characteristics including severe illumination changes, low signal-to-noise ratio, and high contrast. We address Stream 1 of the SPARK 2026 Challenge, which requires a single model for spacecraft classification, detection, and fine-grained component segmentation across multiple target types. We propose a compact architecture that integrates a MobileNetV3 encoder with a U-Net-style decoder, combining computational efficiency with accurate dense prediction. Detection is derived analytically from the union of predicted component masks, avoiding a separate bounding-box regression head in the single-spacecraft setting. Our method achieved an overall leaderboard score of 0.9482, with task-specific scores of 1.0000 in classification, 0.9788 in detection, and 0.8917 in segmentation. The proposed approach ranked second overall in the SPARK 2026 Challenge, demonstrating that lightweight encoder-decoder architectures can deliver strong multi-task performance for practical onboard space vision systems.
翻译:视觉感知是空间态势感知以及交会、对接、在轨服务、导航等自主在轨操作的基础。然而,该领域的进展受到标注空间图像稀缺以及严苛视觉域特征的制约,这些特征包括剧烈的光照变化、低信噪比和高对比度。我们针对SPARK 2026挑战赛的第一赛题展开研究,该赛题要求使用单一模型完成多种目标类型的航天器分类、检测和细粒度部件分割。我们提出了一种紧凑型架构,将MobileNetV3编码器与U-Net风格解码器相结合,在确保计算效率的同时实现了精确的密集预测。检测结果通过对预测部件掩码的并集进行解析推导得出,从而在单航天器场景下避免了独立的目标框回归头。我们的方法在总排行榜上取得了0.9482分,其中分类任务得分为1.0000,检测任务得分为0.9788,分割任务得分为0.8917。所提方法在SPARK 2026挑战赛中总体排名第二,证明了轻量级编码器-解码器架构能够为实际星载视觉系统提供强大的多任务性能。