AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.
翻译:AI辅助的可用性分析有望减少发现可用性问题所需的时间和精力,然而关于AI感知专业度如何随时间影响评估者的分析策略与认知,目前所知甚少。我们开展了一项为期五阶段(每位参与者总计六小时)的组内研究,邀请12位专业用户体验评估者与两个分别设计为“新手型”和“专家型”(在建议数量与回答准确性上存在差异)的对话助手协作。研究记录了行为指标(检查轮次数量、建议采纳率),收集了主观评分(信任度、感知效率),并进行了半结构化访谈。参与者经历了初始的新奇效应,随后信任度出现下降但随时间推移逐渐恢复。随着他们从两轮检查转向单轮视频检查策略,其效率得到提升。尽管早期未感知到专业度差异,评估者最终认为经验丰富的对话助手在效率、可信度和全面性上均显著更优。最后,我们提出了适应AI专业度以实现校准式人机协作的设计启示。