We introduce a framework that automates the transformation of static anime illustrations into manipulatable 2.5D models. Current professional workflows require tedious manual segmentation and the artistic ``hallucination'' of occluded regions to enable motion. Our approach overcomes this by decomposing a single image into fully inpainted, semantically distinct layers with inferred drawing orders. To address the scarcity of training data, we introduce a scalable engine that bootstraps high-quality supervision from commercial Live2D models, capturing pixel-perfect semantics and hidden geometry. Our methodology couples a diffusion-based Body Part Consistency Module, which enforces global geometric coherence, with a pixel-level pseudo-depth inference mechanism. This combination resolves the intricate stratification of anime characters, e.g., interleaving hair strands, allowing for dynamic layer reconstruction. We demonstrate that our approach yields high-fidelity, manipulatable models suitable for professional, real-time animation applications.
翻译:本文提出一种自动化框架,可将静态动漫插画转换为可操控的2.5D模型。当前专业工作流程需要繁琐的手动分割以及对遮挡区域的艺术性"幻觉"处理才能实现动态效果。我们的方法通过将单张图像分解为具有推断绘制顺序、完全修复且语义分明的图层来克服这一局限。针对训练数据稀缺的问题,我们引入一种可扩展引擎,能够从商业Live2D模型中自举高质量监督信号,捕获像素级精确的语义信息与隐藏几何结构。该方法将基于扩散的身体部件一致性模块(强制全局几何连贯性)与像素级伪深度推断机制相结合,从而解析动漫角色复杂的层次结构(例如交错发丝),实现动态图层重建。实验表明,本方法生成的高保真可操控模型适用于专业级实时动画应用。