Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement

Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks. However, a persistent and long-standing limitation is the decline in facial identity (ID) consistency during realistic portrait editing. Due to the human eye's high sensitivity to facial features, such inconsistency significantly hinders the practical deployment of these models. Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP due to Cross-source Distribution Bias and Cross-source Feature Contamination. To address these issues, we propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration. By systematically analyzing diffusion trajectories, sampler behaviors, and attention properties, we introduce three key components: 1) Adaptive mixing strategy that aligns cross-source latent representations throughout the diffusion process. 2) Hybrid solver that disentangles source-specific identity attributes and details. 3) Attentional gating mechanism that selectively entangles visual elements. Extensive experiments show that EditedID achieves state-of-the-art performance in preserving original facial ID and edited element IP consistency. As a training-free and plug-and-play solution, it establishes a new benchmark for practical and reliable single/multi-person facial identity restoration in open-world settings, paving the way for the deployment of multimodal editing large models in real-person editing scenarios. The code is available at https://github.com/NDYBSNDY/EditedID.

翻译：多模态编辑大模型已在多样化任务中展现出强大的编辑能力。然而，一个长期存在的局限性在于真实人像编辑过程中面部身份（ID）一致性的下降。由于人眼对面部特征的高度敏感性，这种不一致性严重阻碍了这些模型的实际部署。现有的人脸ID保留方法因跨源分布偏差和跨源特征污染问题，难以同时实现面部身份与编辑元素IP的一致性复原。为解决这些问题，我们提出了EditedID，一种基于对齐-解纠缠-纠缠框架的鲁棒身份特定人脸复原方法。通过系统分析扩散轨迹、采样器行为及注意力特性，我们引入了三个关键组件：1）在扩散全过程对齐跨源潜在表征的自适应混合策略；2）解纠缠源特定身份属性与细节的混合求解器；3）选择性纠缠视觉元素的注意力门控机制。大量实验表明，EditedID在保持原始面部ID与编辑元素IP一致性方面达到了最先进的性能。作为一种免训练即插即用的解决方案，它为开放场景下单人/多人面部身份复原建立了新的实用可靠基准，为多模态编辑大模型在真人编辑场景中的部署铺平了道路。代码发布于https://github.com/NDYBSNDY/EditedID。