Due to the significant advances in large-scale text-to-image generation by diffusion model (DM), controllable human image generation has been attracting much attention recently. Existing works, such as Controlnet [36], T2I-adapter [20] and HumanSD [10] have demonstrated good abilities in generating human images based on pose conditions, they still fail to meet the requirements of real e-commerce scenarios. These include (1) the interaction between the shown product and human should be considered, (2) human parts like face/hand/arm/foot and the interaction between human model and product should be hyper-realistic, and (3) the identity of the product shown in advertising should be exactly consistent with the product itself. To this end, in this paper, we first define a new human image generation task for e-commerce marketing, i.e., Object-ID-retentive Human-object Interaction image Generation (OHG), and then propose a VirtualModel framework to generate human images for product shown, which supports displays of any categories of products and any types of human-object interaction. As shown in Figure 1, VirtualModel not only outperforms other methods in terms of accurate pose control and image quality but also allows for the display of user-specified product objects by maintaining the product-ID consistency and enhancing the plausibility of human-object interaction. Codes and data will be released.
翻译:由于扩散模型在大规模文本到图像生成方面取得了显著进展,可控人体图像生成近期备受关注。现有工作如ControlNet[36]、T2I-Adapter[20]和HumanSD[10]虽已展现出基于姿态条件生成人体图像的良好能力,但仍无法满足真实电商场景的需求,具体包括:(1)需考虑展示商品与人体之间的交互关系;(2)人脸、手部、手臂、脚部等人体制件以及人体模型与商品之间的交互应达到超写实水平;(3)广告中展示的商品身份需与实物完全一致。为此,本文首先针对电商营销场景定义了一项新的人体图像生成任务——物体身份保持性人-物交互图像生成,并提出VirtualModel框架以生成展示商品的人体图像。该框架支持任意品类商品及任意类型人-物交互的展示。如图1所示,VirtualModel不仅在精确姿态控制和图像质量方面优于其他方法,还能通过保持商品ID一致性并增强人-物交互的合理性,实现用户指定商品对象的展示。相关代码与数据将予以公开。