Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts. However, acquired associations often contain many spurious correlations that degrade model generalization, leading statistical models to be vulnerable to visually contextual variations in images (e.g., unseen image styles/external interventions). To tackle this, we present a causality inspired parsing paradigm termed CIParsing, which follows fundamental causal principles involving two causal properties for human parsing (i.e., the causal diversity and the causal invariance). Specifically, we assume that an input image is constructed by a mix of causal factors (the characteristics of body parts) and non-causal factors (external contexts), where only the former ones cause the generation process of human parsing.Since causal/non-causal factors are unobservable, a human parser in proposed CIParsing is required to construct latent representations of causal factors and learns to enforce representations to satisfy the causal properties. In this way, the human parser is able to rely on causal factors w.r.t relevant evidence rather than non-causal factors w.r.t spurious correlations, thus alleviating model degradation and yielding improved parsing ability. Notably, the CIParsing is designed in a plug-and-play fashion and can be integrated into any existing MHP models. Extensive experiments conducted on two widely used benchmarks demonstrate the effectiveness and generalizability of our method.
翻译:摘要:现有多人解析方法(MHP)通过统计模型获取图像与标注身体部位之间的潜在关联。然而,所获关联常包含大量虚假相关性(spurious correlations),这些相关性会降低模型泛化能力,导致统计模型易受图像中视觉上下文变化(如未见过的图像风格/外部干预)的影响。为解决此问题,我们提出一种受因果启发的解析范式——CIParsing,其遵循包含人体解析两大因果属性(即因果多样性与因果不变性)的基本因果准则。具体而言,我们假设输入图像由因果因子(身体部位特征)与非因果因子(外部上下文)混合构成,其中仅前者驱动人体解析的生成过程。由于因果/非因果因子不可观测,所提出的CIParsing中的人体解析器需构建因果因子的潜在表征,并通过学习强制这些表征满足因果属性。通过这种方式,解析器能够依赖与相关证据对应的因果因子,而非与非因果因子对应的虚假相关性,从而缓解模型退化并提升解析能力。值得注意的是,CIParsing采用即插即用设计,可集成至现有任意MHP模型中。在两大广泛使用的基准数据集上的大量实验表明,该方法具备有效性与泛化能力。