Digital Adoption Platform (DAP) provide web-based overlays that deliver operation guidance and contextual hints to help users navigate complex websites. Although modern DAP tools enable non-experts to author such guidance, maintaining these guides remains labor-intensive because website layouts and functionalities evolve continuously, which requires repeated manual updates and re-annotation. In this work, we introduce \textbf{GuideWeb}, a new benchmark for automatic in-app guide generation on real-world web UIs. GuideWeb formulates the task as producing page-level guidance by selecting \textbf{guide target elements} grounded in the webpage and generating concise guide text aligned with user intent. We also propose a comprehensive evaluation suite that jointly measures the accuracy of guide target element selection and the quality of generated intents and guide texts. Experiments show that our proposed \textbf{GuideWeb Agent} achieves \textbf{30.79\%} accuracy in guide target element prediction, while obtaining BLEU scores of \textbf{44.94} for intent generation and \textbf{21.34} for guide-text generation. Existing baselines perform substantially worse, which highlights that automatic guide generation remains challenging and that further advances are necessary before such systems can be reliably deployed in real-world settings.
翻译:数字应用平台(DAP)通过提供基于网页的覆盖层,向用户传递操作指导和上下文提示,以帮助其浏览复杂网站。尽管现代DAP工具允许非专业人员编写此类引导,但由于网站布局与功能持续演进,需要重复进行手动更新与重新标注,导致维护这些引导仍然劳动密集。本文提出 **GuideWeb**,一个面向真实网页界面的应用内引导自动生成新基准。GuideWeb 将该任务定义为:通过选择网页中**引导目标元素**并生成符合用户意图的简洁引导文本,以产出页面级引导。我们还提出一套综合评估方案,联合度量引导目标元素选择的准确性以及生成意图与引导文本的质量。实验表明,我们提出的 **GuideWeb Agent** 在引导目标元素预测上达到 **30.79%** 的准确率,同时在意图生成上获得 **44.94** 的BLEU分数,在引导文本生成上获得 **21.34** 的BLEU分数。现有基线模型表现显著更差,这凸显了自动引导生成仍具挑战性,在此类系统能够可靠部署于真实场景之前,仍需进一步的技术突破。