We present a vision-language model (VLM) that automatically edits website HTML to address violations of the Web Content Accessibility Guidelines 2 (WCAG2) while preserving the original design. We formulate this as a supervised image-conditioned program synthesis task, where the model learns to correct HTML given both the code and its visual rendering. We create WebAccessVL, a website dataset with manually corrected accessibility violations. We then propose a violation-conditioned VLM that further takes the detected violations' descriptions from a checker as input. This conditioning enables an iterative checker-in-the-loop refinement strategy at test time. We conduct extensive evaluation on both open API and open-weight models. Empirically, our method achieves 0.211 violations per website, a 96.0\% reduction from the 5.34 violations in raw data and 87\% better than GPT-5. A perceptual study also confirms that our edited websites better maintain the original visual appearance and content.
翻译:我们提出了一种视觉语言模型(VLM),它能自动编辑网站HTML以解决违反《网络内容可访问性指南2》(WCAG2)的问题,同时保持原始设计。我们将此任务形式化为一个监督式的图像条件程序合成问题,模型学习在给定代码及其视觉渲染的情况下修正HTML。我们创建了WebAccessVL,这是一个包含人工修正可访问性违规的网站数据集。随后,我们提出了一种违规条件VLM,它进一步将检测器发现的违规描述作为输入。这种条件化使得在测试时能够采用一种迭代的“检测器在环”优化策略。我们对开放API和开放权重模型进行了广泛评估。实验表明,我们的方法实现了每个网站0.211次违规,相较于原始数据中的5.34次违规减少了96.0%,并且比GPT-5提升了87%。一项感知研究也证实,我们编辑后的网站能更好地保持原始视觉外观和内容。