Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we conducted formative studies with five AI auditors and synthesized five design goals for supporting systematic AI audits. Based on these insights, we developed Vipera, an interactive auditing interface that employs multiple visual cues including a scene graph to facilitate image sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, Vipera leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. Through a controlled experiment with 24 participants experienced in AI auditing, we demonstrate Vipera's effectiveness in helping auditors navigate large AI output spaces and organize their analyses while engaging with diverse criteria.
翻译:尽管文本到图像生成式AI系统的能力日益增强,但其输出仍存在偏见、冒犯性内容及其他问题。虽然近期技术进步已支持生成式AI的测试与审计,现有审计方法在结构化探索海量AI生成输出空间方面仍面临挑战。为弥补这一不足,我们与五位AI审计员开展形成性研究,并综合提出支持系统化AI审计的五项设计目标。基于这些发现,我们开发了交互式审计界面Vipera,该界面采用包含场景图在内的多重视觉线索,以促进图像意义建构并启发审计员探索及分层组织审计标准。此外,Vipera利用基于大型语言模型的建议机制,助力探索未触及的审计方向。通过对24名具有AI审计经验的参与者开展对照实验,我们验证了Vipera在帮助审计员导航大规模AI输出空间、组织分析框架以及处理多样化审计标准方面的有效性。