The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.
翻译:生成式人工智能(GenAI)模型在计算机视觉领域的快速发展,亟需有效的评估方法以确保其质量与公平性。现有工具主要聚焦于数据集质量保障与模型可解释性,在模型开发过程中对GenAI输出的评估方面存在显著空白。当前实践往往依赖开发者主观视觉判断,缺乏可扩展性与泛化能力。本文通过一项针对工业界GenAI模型开发者的形成性研究填补了这一空白。研究结果促成了GenLens的研发——一个专为模型开发早期阶段系统性评估GenAI输出而设计的可视化分析界面。GenLens提供了可量化的方法,用于概览与标注失败案例、自定义问题标签与分类,以及聚合多个用户的标注以增强协作。面向模型开发者的用户研究表明,GenLens能有效优化其工作流程,这体现在高满意度及用户强烈意愿将其纳入实践两方面。本研究凸显了在GenAI开发中构建稳健的早期评估工具的重要性,并推动公平、高质量的GenAI模型的发展。