Humans have long been recorded in a variety of forms since antiquity. For example, sculptures and paintings were the primary media for depicting human beings before the invention of cameras. However, most current human-centric computer vision tasks like human pose estimation and human image generation focus exclusively on natural images in the real world. Artificial humans, such as those in sculptures, paintings, and cartoons, are commonly neglected, making existing models fail in these scenarios. As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. Specifically, Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios, which are annotated with bounding boxes, keypoints, self-contact points, and text information for humans represented in both 2D and 3D. It is, therefore, comprehensive and versatile for various downstream tasks. We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer. As a challenging dataset, we hope Human-Art can provide insights for relevant research and open up new research questions.
翻译:古往今来,人类以多种形式被记录。例如,在相机发明之前,雕塑和绘画是描绘人类的主要媒介。然而,当前大多数以人为中心的计算机视觉任务(如人体姿态估计和人体图像生成)仅专注于现实世界中的自然图像。雕塑、绘画和卡通中的人工角色常被忽视,导致现有模型在这些场景中失效。作为对生活的抽象化表达,艺术同时融合了自然与人工场景中的人类形象。我们利用这一特性,引入Human-Art数据集以桥接自然与人工场景中的相关任务。具体而言,Human-Art包含5万张高质量图像,涵盖来自5种自然场景和15种人工场景的超过12.3万个实例,这些实例标注了边界框、关键点、自接触点以及面向2D和3D人体表征的文本信息。因此,该数据集可广泛适用于各类下游任务。我们还针对人体检测、2D/3D人体姿态估计、图像生成和运动迁移等任务提供了丰富的基线结果与详细分析。作为一项具有挑战性的数据集,我们期待Human-Art能为相关研究提供洞见,并开拓新的研究问题。