UI-to-Code generation requires vision-language models (VLMs) to produce thousands of tokens of structured HTML/CSS from a single screenshot, making visual token efficiency critical. Existing compression methods either select tokens at inference time using task-agnostic heuristics, or zero out low-attention features without actually shortening the sequence -- neither truly reduces prefill latency or adapts to the non-uniform information density of UI screenshots. Meanwhile, optical (encoder-side learned) compression has shown strong results for document OCR, yet no prior work has adapted this paradigm to UI-to-Code generation. We propose UIPress, a lightweight learned compression module inserted between the frozen ViT encoder and the LLM decoder of Qwen3-VL-8B. UIPress combines depthwise-separable convolutions, element-guided spatial reweighting, and Transformer refinement to compress ${\sim}$6{,}700 visual tokens to a fixed budget of 256. Together with Low-Rank Adaptation (LoRA) on the decoder to bridge the representation gap, the entire system adds only ${\sim}$21.7M trainable parameters (0.26\% of the 8B base model). Under a fair comparison on the same base model against four baselines on Design2Code, UIPress at 256 tokens achieves a CLIP score of 0.8127, outperforming the uncompressed baseline by +7.5\% and the strongest inference-time method by +4.6\%, while delivering 9.1$\times$ time-to-first-token speedup. To the best of our knowledge, UIPress is the first encoder-side learned compression method for the UI-to-Code task.
翻译:UI到代码生成任务需要视觉语言模型从单张截图生成数千词元的结构化HTML/CSS代码,这使得视觉令牌效率成为关键。现有压缩方法要么在推理阶段使用任务无关启发式方法选择令牌,要么将低注意力特征归零而不实际缩短序列——这两者均无法真正降低预填充延迟或适应UI截图非均匀的信息密度。与此同时,光学(编码器端学习型)压缩在文档OCR任务中展现出显著效果,但尚无先前工作将这一范式迁移至UI到代码生成领域。我们提出UIPress,一种轻量级学习型压缩模块,嵌入在Qwen3-VL-8B的冻结ViT编码器与LLM解码器之间。UIPress结合深度可分离卷积、元素引导的空间重加权和Transformer精炼,将约6700个视觉令牌压缩至固定预算256个。配合解码器端的低秩适配以弥合表示间隙,整个系统仅增加约2170万训练参数(占80亿基础模型的0.26%)。在与四个基线方法基于相同基础模型、在Design2Code基准上的公平对比中,UIPress在256令牌下取得0.8127的CLIP分数,超越未压缩基线7.5%和最强推理时方法4.6%,同时实现了9.1倍的首次令牌到达时间加速。据我们所知,UIPress是首个用于UI到代码任务的编码器端学习型压缩方法。