Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective, occlusion-aware human segmentation while preserving the generalization capabilities of the original model. The code and pretrained models will be available at https://mirapurkrabek.github.io/BBox-Mask-Pose/.
翻译:Segment Anything (SAM) 为人体分割提供了前所未有的基础,但在遮挡情况下可能表现不佳,此时关键点可能部分或完全不可见。我们以最小的编码器修改适配 SAM 2.1 用于姿态引导分割,保留了其强大的泛化能力。通过一种称为 PoseMaskRefine 的微调策略,我们将具有高可见度的姿态关键点整合到 SAM 原本采用的迭代校正过程中,从而在多个数据集上实现了更高的鲁棒性和准确性。在推理阶段,我们通过仅选择可见度最高的三个关键点来简化提示过程。该策略降低了对常见错误(如缺失身体部位或衣物误分类)的敏感性,并允许仅从单个关键点即可准确预测掩码。我们的结果表明,对 SAM 进行姿态引导微调能够实现有效且具有遮挡感知能力的人体分割,同时保留了原始模型的泛化能力。代码与预训练模型将在 https://mirapurkrabek.github.io/BBox-Mask-Pose/ 上提供。