Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents' preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents' behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents' actions, whereas font styling, text color, and item image clarity exhibit minor effects.
翻译:网络智能体在广泛的网络任务中展现出强大的性能。然而,现有关于环境变化影响的研究大多集中于对抗攻击的鲁棒性,较少关注智能体在良性场景中的偏好。尽管早期研究已探讨了文本属性如何影响智能体行为,但对于视觉属性如何塑造智能体决策的系统性理解仍然有限。为此,我们提出了VAF,一个用于量化网页视觉属性因素如何影响网络智能体决策的受控评估流程。具体而言,VAF包含三个阶段:(i) 变体生成,确保变体与原始项目语义相同,仅视觉属性存在差异;(ii) 浏览交互,智能体通过滚动和点击感兴趣项目来导航页面,模拟人类用户的在线浏览方式;(iii) 通过智能体的点击行为和推理进行验证,我们使用目标点击率和目标提及率共同评估视觉属性的影响。通过定量测量原始版本与变体之间的决策差异,我们识别出哪些视觉属性对智能体行为影响最大。在8个变体族(共48个变体)、5个真实世界网站(包括购物、旅行和新闻浏览)以及4个代表性网络智能体上进行的大量实验表明,背景颜色对比度、项目大小、位置和卡片清晰度对智能体行为有显著影响,而字体样式、文本颜色和项目图像清晰度的影响较小。