Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related applications remains largely unexplored. In the space domain, accurate manual annotation is particularly challenging due to factors such as low visibility, illumination variations, and object blending with planetary backgrounds. Developing methods that can detect and segment spacecraft and orbital targets without requiring extensive manual labeling is therefore of critical importance. In this work, we propose an annotation-free detection and segmentation pipeline for space targets using VLMs. Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM. These pseudo-labels are then leveraged in a teacher-student label distillation framework to train lightweight models. Despite the inherent noise in the pseudo-labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference. Experimental evaluations on the SPARK-2024, SPEED+, and TANGO datasets on segmentation tasks demonstrate consistent improvements in average precision (AP) by up to 10 points. Code and models are available at https://github.com/giddyyupp/annotation-free-spacecraft-segmentation.
翻译:视觉语言模型(VLMs)在开放世界零样本视觉识别任务中展现出卓越性能,但其在航天领域的应用潜力尚未得到充分探索。在航天场景中,由于目标可见度低、光照条件多变、以及与行星背景高度融合等因素,精确的人工标注尤为困难。因此,开发无需大量人工标注即可检测与分割航天器及轨道目标的方法具有至关重要的意义。本研究提出一种基于VLM的无标注空间目标检测与分割流程。该方法首先利用预训练的VLM为少量未标注真实数据自动生成伪标签,随后通过师生标签蒸馏框架,利用这些伪标签训练轻量化模型。尽管伪标签本身存在固有噪声,但蒸馏过程相较于直接零样本VLM推理仍能带来显著的性能提升。在SPARK-2024、SPEED+和TANGO数据集上的分割任务实验表明,该方法在平均精度(AP)指标上实现了最高达10个百分点的稳定提升。代码与模型已开源:https://github.com/giddyyupp/annotation-free-spacecraft-segmentation。