Anonymizing human-centric video data is an understudied problem. Prior anonymization techniques either blur or redact pixels at the cost of realism and downstream utility, or generate frame-by-frame at the cost of temporal coherence. We introduce ReGenHuman, the first full-body video anonymization pipeline that is simultaneously realistic, temporally consistent, and anonymous by construction. Contrary to past approaches which redact or edit the inputs directly, we propose a regenerate, don't edit paradigm. Our approach composites 2D pose, segmentation, and monocular depth into two complementary conditioning streams - StructAll and StructHuman, which are used to fine-tune a video-to-video diffusion backbone on in-the-wild human videos, synthesizing the human regions entirely from identity-free structural cues. We evaluate our model on privacy, quality, and utility, and show that our ReGenHuman achieves the best tradeoff across all three axes against current baselines. We further show that our anonymized videos remain effective for downstream tasks, including video question answering.
翻译:人体视频数据的匿名化是一个尚未充分研究的问题。现有匿名化技术要么通过模糊或遮蔽像素实现,但会牺牲真实感和下游应用效用;要么逐帧生成,但会破坏时间连贯性。我们提出ReGenHuman——首个同时具备真实感、时间一致性和本质匿名性的全身视频匿名化流水线。与直接遮蔽或编辑输入的传统方法不同,我们提出"再生而非编辑"范式。该方法将2D姿态、分割图和单目深度复合为两个互补条件控制流——结构全图(StructAll)和结构人体(StructHuman),用于在自然场景人体视频上微调视频到视频扩散模型的主干网络,完全基于无身份标识的结构线索合成人体区域。我们在隐私性、质量与效用三个维度评估模型,结果表明ReGenHuman在三个评估维度上均取得当前基线方法的最佳权衡。进一步实验证明,匿名化后的视频仍能有效支持包括视频问答在内的下游任务。