RemoCap: Disentangled Representation Learning for Motion Capture

Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference between the target human body and surrounding objects. It achieves this by disentangling target features along the dimension axis. By aligning features based on their spatial positions in each dimension, SD isolates the target object's response within a global window, enabling accurate capture despite occlusions. The MD module employs a channel-wise temporal shuffling strategy to simulate diverse scene dynamics. This process effectively disentangles motion features, allowing RemoCap to reconstruct occluded parts with greater fidelity. Furthermore, this paper introduces a sequence velocity loss that promotes temporal coherence. This loss constrains inter-frame velocity errors, ensuring the predicted motion exhibits realistic consistency. Extensive comparisons with state-of-the-art (SOTA) methods on benchmark datasets demonstrate RemoCap's superior performance in 3D human body reconstruction. On the 3DPW dataset, RemoCap surpasses all competitors, achieving the best results in MPVPE (81.9), MPJPE (72.7), and PA-MPJPE (44.1) metrics. Codes are available at https://wanghongsheng01.github.io/RemoCap/.

翻译：从真实运动序列中重建三维人体仍因普遍且复杂的遮挡而面临挑战。现有方法难以捕捉被遮挡身体部位的动态变化，导致模型穿透和运动失真。RemoCap利用空间解耦（SD）与运动解耦（MD）克服了这些局限。SD通过沿维度轴解耦目标特征，解决了目标人体与周围物体之间的遮挡干扰。通过根据各维度空间位置对齐特征，SD在全局窗口内隔离目标物体的响应，从而在遮挡情况下实现精准捕获。MD模块采用通道维度的时序混洗策略模拟多样化场景动态，该过程有效解耦运动特征，使RemoCap能够以更高保真度重建被遮挡部位。此外，本文引入序列速度损失以增强时间连贯性，该损失通过约束帧间速度误差，确保预测运动呈现真实一致性。在基准数据集上与最新方法（SOTA）的广泛对比表明，RemoCap在三维人体重建中表现卓越。在3DPW数据集上，RemoCap超越所有竞争对手，在MPVPE（81.9）、MPJPE（72.7）和PA-MPJPE（44.1）指标上均取得最优结果。代码开源地址：https://wanghongsheng01.github.io/RemoCap/。

相关内容

表示学习

关注 187

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日