VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for realistic results, whereas recent non-parametric methods achieve higher precision by directly regressing the 3D coordinates of the human body mesh. This work introduces a novel paradigm to address the HPSE problem, involving a low-dimensional discrete latent representation of the human mesh and framing HPSE as a classification task. Instead of predicting body model parameters or 3D vertex coordinates, we focus on predicting the proposed discrete latent representation, which can be decoded into a registered human mesh. This innovative paradigm offers two key advantages. Firstly, predicting a low-dimensional discrete representation confines our predictions to the space of anthropomorphic poses and shapes even when little training data is available. Secondly, by framing the problem as a classification task, we can harness the discriminative power inherent in neural networks. The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh. The experimental results demonstrate that VQ-HPS outperforms the current state-of-the-art non-parametric approaches while yielding results as realistic as those produced by parametric methods when trained with little data. VQ-HPS also shows promising results when training on large-scale datasets, highlighting the significant potential of the classification approach for HPSE. See the project page at https://g-fiche.github.io/research-pages/vqhps/

翻译：以往基于RGB图像的人体姿态与形状估计（HPSE）研究可大致分为两类：参数化方法与非参数化方法。参数化技术利用低维统计人体模型以获得逼真结果，而近年来的非参数化方法则通过直接回归人体网格的三维坐标实现更高精度。本文提出了一种解决HPSE问题的新范式，该范式采用人体网格的低维离散隐表示，并将HPSE构建为分类任务。我们不再预测人体模型参数或三维顶点坐标，而是专注于预测所提出的离散隐表示，该表示可解码为已配准的人体网格。这一创新范式具有两大关键优势。首先，预测低维离散表示能将预测结果约束在符合人体形态的姿势与形状空间内，即使在训练数据有限时亦然。其次，通过将问题构建为分类任务，我们能充分利用神经网络固有的判别能力。所提出的模型VQ-HPS可预测网格的离散隐表示。实验结果表明，在少量数据训练时，VQ-HPS不仅超越了当前最先进的非参数化方法，同时能产生与参数化方法相媲美的逼真结果。在大规模数据集上训练时，VQ-HPS亦展现出良好性能，这凸显了分类方法在HPSE领域的巨大潜力。项目页面详见 https://g-fiche.github.io/research-pages/vqhps/