Automatically producing instructions to modify one's posture could open the door to endless applications, such as personalized coaching and in-home physical therapy. Tackling the reverse problem (i.e., refining a 3D pose based on some natural language feedback) could help for assisted 3D character animation or robot teaching, for instance. Although a few recent works explore the connections between natural language and 3D human pose, none focus on describing 3D body pose differences. In this paper, we tackle the problem of correcting 3D human poses with natural language. To this end, we introduce the PoseFix dataset, which consists of several thousand paired 3D poses and their corresponding text feedback, that describe how the source pose needs to be modified to obtain the target pose. We demonstrate the potential of this dataset on two tasks: (1) text-based pose editing, that aims at generating corrected 3D body poses given a query pose and a text modifier; and (2) correctional text generation, where instructions are generated based on the differences between two body poses.
翻译:自动生成调整人体姿态的指令,可为个性化训练和家庭物理治疗等应用开辟无限可能。逆向问题(即基于自然语言反馈优化三维姿态)则有助于辅助三维角色动画或机器人教学等场景。尽管近期有少量研究探索了自然语言与三维人体姿态之间的联系,但尚未有工作聚焦于描述三维人体姿态差异。本文针对利用自然语言纠正三维人体姿态的问题展开研究。为此,我们提出PoseFix数据集,包含数千组配对的三维姿态及其对应的文本反馈,描述了如何修改源姿态以获得目标姿态。我们在两个任务上验证了该数据集的潜力:(1)基于文本的姿态编辑——给定查询姿态和文本修改器生成校正后的三维人体姿态;(2)纠错文本生成——根据两个人体姿态间的差异生成指令。