Human de-occlusion, which aims to infer the appearance of invisible human parts from an occluded image, has great value in many human-related tasks, such as person re-id, and intention inference. To address this task, this paper proposes a dynamic mask-aware transformer (DMAT), which dynamically augments information from human regions and weakens that from occlusion. First, to enhance token representation, we design an expanded convolution head with enlarged kernels, which captures more local valid context and mitigates the influence of surrounding occlusion. To concentrate on the visible human parts, we propose a novel dynamic multi-head human-mask guided attention mechanism through integrating multiple masks, which can prevent the de-occluded regions from assimilating to the background. Besides, a region upsampling strategy is utilized to alleviate the impact of occlusion on interpolated images. During model learning, an amodal loss is developed to further emphasize the recovery effect of human regions, which also refines the model's convergence. Extensive experiments on the AHP dataset demonstrate its superior performance compared to recent state-of-the-art methods.
翻译:人体去遮挡旨在从被遮挡图像中推断不可见人体部位的外观,其在行人重识别、意图推断等众多与人相关的任务中具有重要价值。针对该任务,本文提出一种动态掩码感知Transformer(DMAT),可动态增强人体区域信息并削弱遮挡信息的影响。首先,为增强令牌表示,我们设计了具有扩大卷积核的扩展卷积头,能够捕获更多局部有效上下文并缓解周围遮挡的影响。为聚焦于可见人体部件,我们通过整合多重掩码提出一种新颖的动态多头人体掩码引导注意力机制,可防止去遮挡区域同化为背景。此外,采用区域上采样策略以减轻遮挡对插值图像的影响。在模型学习过程中,我们设计了非完整损失函数以进一步强调人体区域的恢复效果,同时优化模型收敛性。在AHP数据集上的大量实验表明,该模型性能优于最新方法。