Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents' preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents' behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents' actions, whereas font styling, text color, and item image clarity exhibit minor effects.
翻译:网络智能体在广泛的网络任务中展现出卓越性能。然而,现有关于环境变化影响的研究主要集中于对抗攻击的鲁棒性,较少关注智能体在良性场景中的偏好。尽管早期研究已探讨文本属性如何影响智能体行为,但关于视觉属性如何塑造智能体决策的系统性理解仍显不足。为此,我们提出VAF——一个用于量化网页视觉属性因素如何影响网络智能体决策的受控评估流程。具体而言,VAF包含三个阶段:(i) 变体生成:确保变体与原始项目语义完全一致,仅视觉属性存在差异;(ii) 浏览交互:智能体通过滚动页面并点击感兴趣项目进行导航,模拟人类用户的在线浏览行为;(iii) 双重验证:通过智能体的点击动作与推理过程进行验证,我们采用目标点击率与目标提及率共同评估视觉属性的影响。通过定量测量原始版本与变体之间的决策差异,我们识别出对智能体行为影响最显著的视觉属性。在8个变体族(共48个变体)、5个真实网站(涵盖购物、旅行与新闻浏览场景)及4个代表性网络智能体上的大量实验表明:背景色对比度、项目尺寸、位置及卡片清晰度对智能体行为具有显著影响,而字体样式、文本颜色与项目图像清晰度的影响相对较小。