Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.
翻译:虚拟试戴眼镜技术允许用户无需实际佩戴即可将不同形状和款式的眼镜置于面部图像上。现有方法虽已取得显著成果,但眼镜款式种类有限且交互方式不够直观高效。针对这些局限,我们提出一种基于文本引导的眼镜操控方法,通过二值掩码与文本分别控制眼镜的形状与款式。具体而言,我们引入掩码编码器提取掩码条件,并设计调制模块实现文本与掩码条件的联合注入。该设计可基于文本描述与空间约束对眼镜外观进行精细控制。我们的方法包含解耦映射器与解耦策略以保留无关区域,从而实现更优的局部编辑。针对多模态条件收敛速度差异,采用两阶段训练方案,成功实现了对眼镜形状与款式的同步控制。大量对比实验与消融分析验证了该方法在生成多样化眼镜款式的同时保留无关区域的有效性。