Multimodal Large Language Models (MLLMs) have demonstrated strong performance on the UI-to-code task, which aims to generate UI code from design mock-ups. However, when applied to long and complex websites, they often struggle with fragmented segmentation, redundant code generation for repetitive components, and frequent UI inconsistencies. To systematically investigate and address these challenges, we introduce ComUIBench, a new multi-page complex webpage benchmark with component annotations, designed to evaluate MLLMs' ability to generate reusable UI code in realistic website scenarios. Building upon this benchmark, we propose ComUICoder, a component-based UI code generation framework that emphasizes semantic-aware segmentation, code reuse, and fine-grained refinement. Specifically, ComUICoder incorporates (1) Hybrid Semantic-aware Block Segmentation for accurate UI semantic coherent block detection, (2) Visual-aware Graph-based Block Merge to consolidate structurally similar components within and across webpages for reusable implementation, and (3) Priority-based Element-wise Feedback to refine generated code and reduce element-level inconsistencies. Extensive experiments demonstrate that ComUICoder significantly improves overall generation quality and code reusability on complex multipage websites. Our datasets and code are publicly available at https://github.com/WebPAI/ComUICoder.
翻译:多模态大语言模型(MLLMs)在UI到代码任务(即从设计稿生成UI代码)上已展现出强大的性能。然而,当应用于长而复杂的网站时,它们常常面临分割碎片化、为重复组件生成冗余代码以及频繁出现UI不一致等问题。为了系统性地研究并应对这些挑战,我们引入了ComUIBench,这是一个带有组件标注的新多页面复杂网页基准,旨在评估MLLMs在真实网站场景中生成可复用UI代码的能力。基于此基准,我们提出了ComUICoder,一个强调语义感知分割、代码复用和细粒度优化的组件化UI代码生成框架。具体而言,ComUICoder包含:(1)混合语义感知块分割,用于精确检测UI语义连贯的区块;(2)基于视觉感知的图块合并,用于整合网页内及跨网页的结构相似组件以实现可复用实现;(3)基于优先级的元素级反馈,用于优化生成代码并减少元素级不一致。大量实验表明,ComUICoder在复杂多页面网站上显著提升了整体生成质量和代码可复用性。我们的数据集和代码已在 https://github.com/WebPAI/ComUICoder 公开。